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I.O.  Introduction 


I. 0.1.  This  Is  the  first  chapter  of  the  first  part  of  an  extended 

work  dedicated  to  the  presentation  of  the  author's  original  results,  as 
yet  unpublished,  except  [14],  concerning  the  concept  of  information  In 
Its  most  general  aspects. 

This  work  is  the  development  of  the  Ideas  presented  by  the  author  In 
his  talk  [15 ]  at  the  Sixth  International  Symposium  on  Multivariate 
Analysis,  July  25-29,  1983,  organized  by  the  Center  for  Multivariate 
Analysis,  University  of  Pittsburgh.  It  was  partially  sponsored  by  the 
Center  and  the  author  expresses  his  sincere  appreciation  for  this  to 
Professor  P.  R.  Krlshnalah  and  Professor  C.  R.  Rao. 

Basically,  this  work  can  be  regarded  as  a  development  and  continuation 
of  the  work  done  in  this  direction  by  A.  N.  Kolmogorov,  I.  M.  Gelfand, 
A.  M.  Yaglom,  R.  L.  Dobrushin,  M.  S.  Pinsker,  G.  Kallianpur,  between 
1956-1960  as  well  as  a  continuation  of  the  author's  work  between 
1956-1982. 

1.0.2.  This  chapter  has  an  Introductory  character,  covering  some  of 
the  necessary  preliminaries  for  the  following  chapters.  It  Is  divided 
in  four  subchapters.  The  first  subchapter  discusses  the  problems  con¬ 
nected  with  the  definition  of  the  concept  of  relative  entropy  and  the 
second  some  elementary  properties  of  this  concept;  the  third  presents 
some  additivity  theorems  while  the  fourth  presents  a  generalization  of 
this  concept. 

At  variance  with  the  following  chapters,  this  chapter  is  presenting  a 
number  of  results  obtained  by  other  authors.  Because  these  results  are 
spread  In  various  publications,  some  difficult  to  find.  In  various 
languages,  and  a  reference  book  does  not  exist,  they  are  presented  here 


2 


li 


It  is  to  remark  that  some  of  the  original  proofs  of  those  results  are 
exceedingly  difficult  to  follow,  some  representing  only  Indications  how 
theproiofs  should  .go,  some  are  complicated  without  any  reason,  some 
contain  non-necessary  restrictions.  Some  results  are  given  completely 
without  any  proof. 

For  all  those  reasons  the  author  is  presenting  here  complete  straight¬ 
forward  proofs  for  all  the  results  discussed;  sometimes  the  results 
are  presented  with  better  proofs,  sometime  with  new  proofs,  sometime 
the  results  themselves  are  bettered.  The  comments  at  the  enc  of  the 
chapter  will  Indicate  the  author's  part  In  the  proof  or  in  bettering 
the  proof  If  It  Is  the  case. 

The  third  and  fourth  subchapters  contain  mainly  results  belonging;  to 
the  author,  some  presented  without  proofs  In  [12],  [13],  or  completely 
new  ones. 

1.0.3.  Harold  Jeffreys  ,  professor  of  astronomy  at  the  University 
of  Cambridge,  England,  Introduced  In  literature  Che  concept  of  relative 
entropy.^  In  a  paper  [5]  presented  for  publication  In  1974  he  defines 
the  quantity  which  In  our  notation  Is 


.  h(C:T))  +  h(Ti:E) 

a  measure  of  discrepancy  between  the  probability  distributions  of 
the  random  variables  n.  ^^J^the  second  edition  of  his  "Theory  of 
probability”  [6],  he  continues  to  discuss  the  properties  and  uses  of 
this  quantity.  It  Is  to  remark  that  he  did  not  name  in  any  way  this 
concept . 

Beginning  in  1951,  S.  Kullback  started  a  sustained  research  effort, 
together  with  various  associates,  to  solve  a  series  of  statistical 


problems  with  the  help  of  this  concept.  [9][l0] 


. 
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Claude  Shannon  Introduced  in  1948  the  concept  of  entropy  of  a  random 
variable  and  the  concept  of  the  quantity  of  information  contained  in 
one  random  variable  about  another  one,  as  basic  concepts  of  information 
theory  [16],  [17]. 

It  is  easy  to  recognize  that  the  concept  of  relative  entropy  and  the 
concept  of  quantity  of  information  can  be  obtained  as  particular  cases 
of  the  concept  of  relative  entropy. 

From  these  inter-relations,  Jeffrey's  concept  took  its  name  but 
Jeffrey's  name  was  forgotten  inbetween. 


1 . 1 .  The  definition  of  relative  entropy. 

1.1.1.  Let  C «  n  be  two  random  vectors,  with  the  same  values  x^,  and 
let 


P^(Xi)  “  P(C-x^),  P^(Xjt)  *  P(n  “  x^)  (1  1  i  in).  (1. 1.1.1) 

The  relative  entropy  of  5  with  respect  to  n ,  or  of  P^  with  respect  to 

F  ,  is  given  by  the  expression 
h 


h(C  :  Ti)  -  h(P^  :  P^) 


Pp  (x  J 


r  E  i' 

I  P  (x  )log^i^ 

i=l  ^  n''  1“^ 


(I. 1.1. 2) 


where  for  a  z.  0  we  consider  Olog-  >  0. 

d 

1.1.2.  Let  now  (n,  T,  P)  be  a  probability  space,  where  n  is  a  set  of 

elements  o),E  a  a-  algebra  of  subsets  of  (2,  P  a  probability  measure  on  Z. 

We  consider  two  random  vectors  n,  defined  on  this  probability 

space,  with  values  in  the  measure  space  (X,  S,y  ),  where  X  is  a  set  of 

elements  x,  S  a  ^  -  algebra  of  subsets  of  X,y  a  measure  on  S.  Let 

P^(T)  -  P{m;  C(a))  e  T},  P^(T)  =•  P{u);  n (m)  e  T} ,  T  e  S 

(I. 1.2.1) 


be  their  probability  measures. 


Let  Z  ■  {Zg}  be  an  S-measurable  partition  of  X,  i.e.  a  finite 
family  of  S-measurable  non-overlapping  sets  Z^,  the  union  of  which  is  X. 
Let  us  denote  by  U  the  set  of  all  S-measurable  partitions  Z  of  X. 

For  any  given  random  vectors  and  for  any  S  measurable  partition 
Z  of  X,  we  define  two  finite  valued  random  vectors  both  with 

the  same  values  sa  1,  2,  where  k  Is  the  number  of  elements  in 

the  partition  Z,  and  such  that 

P,  (s)  -  Pp(Z^),  P  (s)  =  P  (Z  )  (I. 1.2. 2) 


By  (I. 1.1.1),  the  relative  entropy  of  with  respect  to  is 


h(5-  :  n-)  -  h(P  :  P„  )  =  I  P-(Z Jlog 
^  ^  ^Z  '’Z  s»l  ^ 


1.1.3.  Lemma  I . 1 . 

Let  P^,  >  0  (1  i  i  i  n)  and 


W 

W 


(I. 1.2. 3) 


n  n 

P  =•  I  Pi.  Q  -  I  Qi 

i-l  ^  i-1 


(1. 1.3.1) 


Then 


P  r  ^i 

Plog-  i  I  P^log^ 

^  i-l  ^1 


(1. 1.3. 2) 


with  equality  iff 


Pf  -  Qi  (1  i  i  i  n) 


(1. 1.3. 3) 


Proof ; 

Let  :  (^(t)  be  a  real  valued  continuous  convex  function,  defined  on 
the  real  line;  >  0,  t^^  (1  <  i  <  n)  arbitrary  real  numbers.  By 
Jensen's  inequality 


n 

<|i(  I  a 

1-1 


i^i^ 


<  I 

i-l 


(1.1.3. A) 


where  equality  takes  place  iff 
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(1. 1.3. 5) 


Taking  in  (I. 1.3. 4) 

<i>(t)  -  tlogt,  ^  p  (1  i  i  i  n)  (1. 1.3. 6) 

we  obtain  (I. 1.3.2),  and  from  (I. 1.3. 5)  we  obtain  (I. 1.3. 3). 

1.1.4.  Let  Z,  Z'  e  U.  We  say  that  Z'  is  a  subpartition  of  Z  if 

each  element  of  Z  can  be  represented  as  the  union  of  some  elements  of 

Z*.  The  set  U  is  partial  ordered  by  this  relation,  which  we  denote  by 
Z'  <  Z.  Indeed 

a)  Z'<  Z  and  Z  <  Z'  imply  that  Z,  Z'  are  identical. 

b)  Z "  <  Z '  and  Z '  <  Z  imply  that  Z "  <  Z . 

c)  For  any  Z',  Z''  e  U  it  exists  an  element  Z  e  U  such  that 

Z  <  Z' ,  Z  <  Z" . 

Indeed  if  Z'  =  {Z',},  Z''  =  {Z'J,},  we  may  take  Z  =  {Z  ,  with 

s  s  s  ^  s 

z  ,  , ,  =  z' , n  z' ; ,  . 

s  ,s  a  s 

1.1.5.  Lemma  1.2 

U  Z,  Z'  e  U  ^  Z'  <  Z,  then 


h(Cz  5 


(I. 1.5.1) 


Proof ;  Suppose  Z  consists  of  elements  Z^  e  S  (1  ^  s  ^  n)  and  Z'  consists 

of  elements  Z',  e  S  (1  S  s'  i  n')  (n  x  n'). 
s 


LJ  z’,  (1  S  s  i  n) 


s  ,  ,  s 
s'eL 

s 


(I. 1.5. 2) 


where  Lg  is  some  subset  of  1,  2,  ...,  n',  so  that  Lg,  L^  are  not  over¬ 
lapping  if  s  y  t,  and  the  union  of  all  L  is  (1,  2,  ...,  n').  Then 


p,(z,)  -  I  Pj(z;,),  p„(z,)  .  I  p„(z;,)  (I.1.5.3) 


From  Lemma  I.l  It  follows  that 


r  *■ 

I  P^(z;,)log^ 


s'eL 


s'eL 


log 


1 

8  eL 

3 _ 

S’EL  ^  ® 

s 


=  Pr(Z  )log -  (1  <  s  1  n)  (1. 1.5. 4) 

so  that 


n  P  (Z'  )  n  Pr(Z„) 

I  I  P  (z:,)log  ^  i  I  p  (Z  )log-^— ^  (1. 1.5. 5) 

s=l  s'eL  ^  ®  P  (Z’,)  s-1  ®  P  (Z  ) 

s  n  s  n  s 


n’  Pr^Z’,)  n  Pr(Z^) 

I  P-(Z’  )log-^  ■-—  i  I  P,(Z Jlog  ^  (1. 1.5. 6) 

s'=l  ^  P  (Z' ,)  s-1  ^  ®  P  (z  ) 

n  s  n  s 

l.e.  (I. 1.5.1) 

Definition  I.l.  The  quantity 

h(C  :  n)  =  sup  h(C,  :  n„)  (1.1.5. 7) 

ZeU  ^  ^ 

is  the  relative  entropy  of  ^  with  respect  to  n,  or  of  P^  with  respect 

to  P  . 

- h 

Theorem  I.l.  For  arbitrary  random  vectors  C ,  n 

h(5  :  n)  i  0  (I. 1.5. 8) 


with  equality  iff 

P^  *  P^(a.e.)  (I. 1.5. 9) 

Proof .  If  C,  n  are  finite  valued,  let  us  consider  the  result  in  Lemma  I.l 
with  P^  =  P^(x^),  =  P^(x^),  P  =  Q  »  1.  So  from  (1. 1.3. 2)  it  follows 

1. 1.5. 8  and  from  (1. 1.3. 3)  it  follows  =  P^(x^)  (1  i  i  in). 

From  Definition  I.l.  follows  the  result  in  (I. 1.5. 8),  (I. 1.5. 9)  in 
general. 
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1.1.6.  In  what  follows,  we  will  need  the  following  result,  in  which 
we  denote 


EiEo-  (E  -E„)(JCE„  -  E). 


Lennna  1.3. 

a)  If  (1  i  k  s;  r)  is  a  sequence  of  q-flnite  measures  on  an  algebra  L 
which  generates  the  a-algebra  S,  then  for  any  set  E  e  S  for  which 
X^(E)  <  ”  (1  S  k  <  r)  and  for  any  positive  number  e  >0  there  exists  a 
set  Eq  e  L  such  that 

A^(E  A  Eq)  i  £  (1  <  k  £  r) .  (I. 1.6.1) 


b)  Moreover,  if  E^^^  (1  £  j  i  m)  are  non-overlapping  sets  belonging 

to  S.  then 


(  U  A  (  (J  E^^^) 

j=l  j=l  . 


<  me ,  (1  <  k  i  r) 


(1. 1.6. 2) 


Proof .  We  prove  first  the  above  lemma  in  the  case  r  =  1,  and  we 
denote  =  A.  The  proof  of  this  case  is  performed  in  two  steps. 

The  first  part  of  this  proof  repeats  that  of  Theorem  D  pp.  56  in 
[4] .  We  reproduce  it  here  for  the  necessity  to  use  the  intermediary 
results  for  the  second  part  of  our  proof.  Because  E  £  S,  it  follows 


X(E)  »  inf{  ^p(E  );  eC  UE.,  E.  £  L;  i  =  1,  2,  ...}  (1. 1.6. 3) 
i-1  i=l  ^ 


so  that  it  follows  that  there  exists  a  sequence  {E^}  of  sets  in  L  such 
that 


E  C  U  E,  (1. 1.6. 4) 

i=l 


and 


X( U  E  )  i  X(E)  +  I  (I. 1.6. 5) 

i*l 


■■■  ....1  -  Kjf-  V-  n  ..  ■■'•.V.V..  K-  «  -  ^  «*- 1^.  V’-  < 
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Since 


n 


^  X(  (JE.)  =  X(  UE  ) 
i-1  1=1 


(1. 1.6. 6) 


there  exists  a  positive  Integer  N  such  that  for  any  n  >  N,  considering 
the  set 


Eq  -  U  E 
1=1  ^ 


(I. 1.6. 7) 


It  follows  that 


X(  U  E.)  i  X(E  )  +  i 
1=1  ^  ^ 


(1. 1.6. 8) 


Obviously 


Eq  e  L. 


(I. 1.6. 9) 


Because 


X(E  -  E  )  i  (  U  E  -  E  )  =  X(  U  E  )  -  X(E  )  i  |  (1.1.6.10) 

1=1  1=1  u  ^ 


X(Eq  -  E)  i  X(  U  Ej^  -  E)  t  X(  U  -  E) 
1=1  e=l 


X(  U  EJ  -  X(E)  £  I 
1=1  • 


(1.1.6.11) 


It  follows  that 

X(E  A  Eq)  =  X(E  -  Eq)  +  X(Eq  -E)  c|+|=e  (1.1.6.12) 

l.e.  (1. 1.6.1) 

The  second  part  of  the  proof  uses  these  results.  Indeed,  let  us 
consider  m  non-overlapping  sets  E^^^  e  L  (1  c  j  t  m) .  Then  from  (I. 1.6. 7), 
(1. 1.6. 8)  It  Is  possible  to  find  for  each  E^^^  a  sequence  {E^'^^}  of 
sets  In  L  such  that 


1=1 


(1. 1.6. 4') 
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X(  U  +  I 

i-1  ^  ^ 


(I. 1.6. 5') 


and  obviously,  because  E^^^  *  <(i(j  k) ,  we  can  find  sets  E^^^  so 


Eg^^n  =*'{>(jl‘k);e,r  =  l,2,  ...;j,k  =  l,2 . m. 


But  from  (I. 1.6. 6)  it  follows 


(1.1.6.13) 


^  X(  U  eJ^^)  =  X(  (J  eJ^V  (1  <  j  1  m) 
i-1  i=l 


(I. 1.6. 6') 


and  so  it  exists  a  positive  integer  such  that  denoting 

e(J)  .  0,(i) 
i-i  ^ 


(I. 1.6. 7') 


for  any  n  >  ,  then 


00 

x(  U  E^^^)  <  X(eJ^^)  +  |;  (1  i  j  i  m)  (I. 1.6. 8') 


N  =  max  N 
Icj  ^ 


(1.1.6.14) 


From  (I. 1.6.7')  it  follows 


m  /  •  \  ^  ^  /'N 

U  =  U  U  V 

j-l  i-1  j=l 


(1.1.6.15) 


From  (I.1.6.5')  it  follows  that 


j»l  i=l  j»l 


(1. 1.6. 5") 


and  from  (1.1.6.13)  this  inequality  can  be  written  as 


1-1  j-i  j-i 


(1. 1.6. 5"  ' 


From  (1. 1.6. 8')  it  follows  that 


I  X(UEi^^)  <  I  X(e5^^  +  I 
j-1  i-1  j  =  l 


(1. 1.6. 8") 


and  from  (1.1.6.13)  this  inequality  can  be  written  as 

“  m  m 


X(U  U  <  X(  U  +  m| 

1-1  j-1  ^  j-1  " 


(1. 1.6. 8") 


Obviously , 


j-1 


(1.1.6.16) 


From  (I.l. 

6.8”’) 

it  follows 

that 

X( 

Ue(J> 

®  (1) 

-  U^q) 

OP 

<  X(  u 

U  E«>  - 

J-i 

j-1 

i-l 

J-1 

J-1 

OP 

“  X((J 

UEp>) 

-  X(  (JEq^^)  i 

i-l 

J-1 

j-1 

From  (I.l. 

6.5’”) 

it  follows 

that 

OP 

X(  u 

- 

OP 

-  x(U 

;e«>) 

-  X(  U  E^'^^)  ; 

J-1 

■  j-1  ^ 

J-1 

i-] 

^  J-1  * 

j-1 

(1.1.6.17) 


(1.1.6.18) 


From  (1.1.6.17),  (1.1.6.18)  it  follows  that 

m 


(  U  A  (  U  eJ^^) 

[j-l  j-l° 


<  me 


(1.1.6.19) 


so  that  our  lemma  is  proved  for  r  -  1.  In  order  to  prove  it  for  r  >  1, 
let  us  consider  in  the  above  results 

r 


1  -  l\- 

K-1 


(1.1.6.20) 


From 


Xi^(E)  i  X(E) 

it  follows  the  general  result  stated  in  the  Lemma  1.3. 


(1.1.6.21) 


1.1.7.  Lemma  1.4. 

Let  y  be  some  finite  measure  on  (X,  S)  and  A,  B  c  S.  Then 


U(A)  -  u(B)!  i  uCA  A  B) 


(I. 1.7,1) 


Proof.  Because 

U(A)  -  u(An  B)  +  u(A  -  B) 
U(B)  -  uCaO  B)  +  u(B  -  A) 


It  follows  that 


U(A)  -  W(B)  -  h(A  -  B)  -  u(B  -  A) 

so  that 

ly(A)  -  U(B)|  -  lu(A  -  B)  -  U(B  -  A)1 

i  h  (A  -  B)  +  M  (B  -  A)  -  u  (A  A  B) 

We  say  that  an  algebra  L  of  S  measurable  sets  generates  the  o-algebra 
Sg  If  S  Is  the  smallest  o-algebra  such  that  L  C  S . 

Theorem  1.2. 

Let : 

a)  L  be  an  algebra  of  S-measurable  sets,  which  generates  S; 

b)  R  be  a  family  of  S-measurable  partitions  of  X. 

If  any  partition  consisting  of  sets  from  L  has  a  subpartition  belonging 
to  R,  then 

h(5  :  n)  -  8up  h(C  :  n_)  (I. 1.7. 2) 

ZeR  •. 

Proof.  Let  C  U  be  the  totality  of  partitions  Z  of  X,  whose  elements 
belong  to  the  algebra  L  of  S-measurable  sets,  and  let 

hj^(C  :  n)  “  sup  hCC^  '  '^2^  (1. 1.7. 3) 


where  h(5_  :  n-)  is  given  by  (I. 1.2. 3).  With  this  notation, 

z  •  z 

hgU  :  n)  -  ha  :  n)  '  (1. 1.7. 4) 

Similarly,  let 

h_(5  :  n)  ■  sup  h(5_  :  n„)  (1. 1.7. 5) 
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Let  Z  *  ®  partition  of  X.  From  Lemma  1.3,  with  *  P^. 

^2  *  follows  that  for  any  e  >  0  and  for  all  Zg  e  S(1  ^  a  ^  n) 

we  can  find  non-overlapping  sets  e  L  such  that 

“  ^0)s>  i  =■  %'^s  ‘  ^(0),>  i  "  <li»ln)  (1. 1.7. 6) 


Let 


^(0)1  ^  ^ 
k-1 

^(0)k  "  ^(0)i  ^  ^  (1  1  k  £  n) 

"  -  \0)±  ^  " 


(1. 1.7. 7) 

(1. 1.7. 8) 

(1. 1.7. 9) 


Because  the  family  of  sets  ®  £  n+1)  is  constituted  of  non- 

overlapping  sets  and  their  union  is  X,  it  follows  that  Z^^^  =  {Z^^H  is 
a  partition  of  X  with  Z^^^  e 

Using  the  results  in  Lemma  1.4,  it  follows  that  for  the  measure  P, 
which  can  be  or  P^ ,  we  have  the  inequalities 

|P(Z<'»)  -  P(Z^)|  <  |PC2<»>)  -  P(Z(0,^)|  +  |P(Z(0)k)  -  P(Zk>l  i 
£  P(z[°^  L  +  P(Z(Qjj^&  Zj^)  (1  k  £  n)  (1.1.7.10) 

Obviously, 


rVV 


So  Chat 


1^1  U  i-1 


(i  <  k  <  n) 


(1.1.7.14) 


(^kHZ...^)  U  -  V  dlkln)  (1.1.7.15) 


From  (1.1.7.11),  (1.1.7.14),  (1.1.7.15),  because  Z^(l  ^  i  £  n)  are 

aon-over lapping,  iC  follows  that 

k-1  k-1 

"k  ‘  "(0)k-  <.U  Z(0)l>l^2(0)k-  <.U  (Zi-Z(o,i))n(Z(^,^-Z^) 

i  _  1*  X 


'^':'/"(o)i  -  V  u «(o)k- V  '  «(o)i  -  V  c  u^(z,iZ(„)j) 


which  proves  (1.1.7.12).  From  (1.1.7.12)  it  follows 


(1.1.7.16) 


so  Chat  from  (1.1.7.10)  we  obtain  the  inequality 

|P(Z^°^  -  P(Z.)|  ^  ke  +  s  -  (k+l)e  £  (n+l)e  (l^k£n)  (1.1.7.17) 


so  that 


P^(Z^°^  -  Pp(Z^)|  1  (n+l)e  (1  <  k  <  n) 


i  (1  <  k  <  n) 


(1.1.7.18) 


(1.1.7.19) 


It  follows  that 

I«c,ki 


(1.1.7.20) 


p„(z'®)  -  P,(Z^)  +  6„_^;  |S^  ,,1  <  (n+l)e 


(1.1.7.21) 


and  consequently 

(0)  ^C^^k  ^  P,(Z,  )  +6 


ff'V  T  T  P  (Z  )  ^  Pdzj 

-  “'z'V  7^  ■  I  <V«E.k  P^  *  IPS - 5^ 

n.Kw  ^  nk  ,,n,k 

W 


W 


1+'^.  , 

^  .  it  . 


(Z  )  log  ^  -  (P^(Z  )  +5  )  log  -r-^  +  d  log 

^  Pn^Z.)  ’  ^^n,k 


VV 


(1.1.7.22) 


(1.1.7.23) 


[f 


i 


i 


'■V!/ 

M  • 


^5,k  ■  p^(Z^)  »  ^n,k  ■  p^(z^) 


So 


l+€. 


(0)  k  ^£^^k^ 

n,k  n  k 


(1.1.7.24) 


Let  us  denote 


M  »  min  {  min  P_(Z,);  min  P  (Z,)} 
l<k<n  ^  l<k<n 


(1.1.7.25) 


Considering  the  case  h(£:ri)  <  ®  it  follows  M  >  0.  Let  us  denote  also 

a  n  +  1 


M 


(1.1.7.26) 


Then 


1 

<lA^ 


'C.k'  P^(Z^ 


(n+1)  e 
<— M—  = 


^  hn.kl  ^  (n+1) e 


'n.k>  P^(z^) 


M 


ee 


(1.1.7.27) 

(1.1.7.28) 


From  log  x  <  x  -  1  it  follows 


l+e_  ,  l+e_  ,  e.  .  -  , 

log  - -  1  =• 

n.k  n.k  ri,k 


(1.1.7.29) 


from  which  it  follows 


k 

1  ■  e  1 

C.k  n.k 

,  ^c.k  % 

■  k 

& 

1 

n  ,k 

< 

1  +  e  , 
n,k 

1  -  |%,k 

(1.1.7.30) 


and  from  (1.1.7.28)  it  follows 


^  -  '%,k'  1 


(1.1.7.30') 


so  that  with  the  help  of  (1.1.7.27),  because  e  <  1,  it  follows 
l+€. 


log 


1+e 


n,k 


2ee  2ee 


l-ec  1-e 


(1.1.7.31) 


so  that  from  (1.1.7.24)  we  obtain 


|Tj  1 


k 
n  ,k 


+  (n+l)e 


log 


W 


W 


where 


<  +  (n+l)E 

1-e 


log 


w 


a  ■  +  (n+l)e|log  Mi 

Consequently, 


n 


n 

'^k  -  - 

'k«l  k-1 

which  can  be  written  as 


\  1  |T|jI  <  nae 


(I.l. 


(I.l. 


(I.l. 


n  Pr(Z^°h  n 

I  P,(Z<°>)  log  * 


S"! 


Pp(Z  ) 

-  I  P  (Z  )  log  ^ 


(r^\  “  L  P  fZ 

n  s 


'rom 


lecause 


n 


P  [(  U  Z  )  4  (  U 
1-1  ^  1-1 


U  Z.  -  X 
1-1  ^ 


<  nae 
(I.l. 

(1.1. 


follows  for  P  -  Pf.  or  P  -  P  ,  that 
E  n 

I 

(O)l'  -  2(0)  i'  i 


P(X  i  u  2(0)  l)  ■  f«  -  U  2(0,  i)  1  « 


1=1 

:  from  (I. 1.7. 9) 

e. 


PCZ^>)  <  e 


i  '•  2o(2i“)  1  e 


(I.l. 


(I.l. 


(I.l. 


it  US  denote 


P  (Z^®^) 


a.e. 

7.32) 

7.32') 

7.33) 

7.33') 

7.34) 

7.35) 

7.36) 

,7.36') 


(1.1.7.37) 


so  that  from  (1.1.7.27)  and  (1.1.7.36)  It  follows  that 

P  (Z^^^) 

log  -MJf-  <  ^log  h\  =  Hlog  Ml 


(1.1.7.38) 


“+1 

“  i  ^  log  “Triv' 


■z(0)-  "zCo)' 


’^Z^  ”  ^  log  -p  (7  '\ 


P  (Z^°h 
n  s 


n  s 


(1.1.7.39) 


(1.1.7.40) 


so  that 


•z(0)-  "z(0) 


(1.1.7.41) 


so  that  from  (1.1.7.33),  (1.1.7,38)  it  follows 

■^^^(0)=  \(0)^-^^^Z=  Vl^  J/k  ■"  In+l  ^  (oa+|log  M|)e 
Z  Z  tCr^l 

(1.1.7.42) 

Consequently,  h(5  :  n  ^qj)  is  as  close  as  we  want  to  h(5z!  "^2}* 

Z  Z 

the  last  one  is  finite,  if  we  take  e  sufficient  small.  If  (1.1.7.40) 
is  not  finite,  then  (1.1.7.39)  will  be  as  large  as  we  want,  taking  z 
sufficient  small. 


Consequently,  to  any  partition  Z  e  U  *  U  ,  it  corresponds  a  partition 

s 

Z^  e  U  such  that  (1.1.7.39),  (1,1.7.40)  are  as  close  as  we  want,  so 

li 

that  from  the  definitions  of  hg(5:n),  h^(?:ri)  it  follows  that 

hg(C:Ti)  1  h^(C:n)  (1.1.7.43) 

From  this,  because  any  partition  with  elements  in  L  has  a  subpartition 
in  R,  it  follows  from  (I. 1.7. 3),  (I. 1.7. 5),  that 

lY(C;n)  <  (5;ri)  (1.1.7.44) 

Because  ^S’  follows 


h^(5:ri)  _<  hg(C:n)  (1.1.7.45) 
hj^(5:n)  <  hg(C;n)  (1.1.7.46) 
From  (1.1.7.43),  (1.1.7.44),  (1.1.7.45),  (1.1.7.46)  it  follows  that 


(1.1.7.46) 


1^(5 :n)  -  hj^(C;n)  -  hg(C:n)  -  h(C:n) 


(1.1.7.47) 


which  proves  theorem  1.2 


Consequently,  In  the  definition  of  h(C:n)  Instead  of  considering  all 
measurable  partitions  in  U  ,  we  may  consider  only  the  subclass  R. 

For  example.  In  the  case  when  X  is  the  real  line  and  S  the  a-algebra 
of  all  Borel  sets  In  it.  It  Is  sufficient  to  consider  only  the  class 
R  of  partitions  with  elements  in  the  algebra  of  finite  unions  of 
Intervals.  In  the  case  when  X  is  the  cartesian  product  Xj^,...,X^  of 
n  real  lines  and  S  Is  the  a-algebra  of  all  Borel  sets  In  It,  It  Is 
sufficient  to  consider  only  the  class  R  of  partitions  with  elements 
in  the  algebra  of  finite  unions  of  n-dlmensional  intervals  of  the 
form  A^x...xA^,  where  is  an  interval  on  the  real  line  (1  £  1  ^  n) 


1.1.8.  Theorem  1.3 

In  order  that  the  relative  entropy  of  5  with  respect  to  n  be  finite. 


it  is  necessarv  that  the  orobability  distribution  P,.  be  absolutel 


continuous  with  respect  to  the  probability  distribution  P 


Under  this  condition,  the  relative  entropy  h(5 :n)  defined  as  the 


supremum  (I. 1.5. 7)  over  all  partitions  of  the  range  of  5  and  n  into 


a  finite  number  of  sets  measurable  with  respect  to  P^  and  P 


equal  to  the  following  integral 


h(?:ri)  =  j  a^,^(x)  log  a^.^(x)  P^(dx) 


(1. 1.8.1) 


where  a^  (x)  is  the  Radon-Nicodym  derivative  of  P..  with  respect  to  P 


a_  (x) 

?:n 


P„(*c) 


(I. 1.8. 2) 


In  this  integral  representation  formula,  the  integral  exists  in  the 


sense  that  the  integral  over  the  set  where  the  integral  is  negative 


converges.  In  particular.  h(C:Tl)  is  finite  or  not  according  as  this 


integral  is  finite  or  not 


Obviously,  the  Integral  representation  formula  can  be  written  as 


where 


h(C:n) 


X 


(I. 1.8. 3) 


l^.^(x)  ■  log  a^.^(x)  (I. 1.8. 4) 

Is  the  relative  entropy  density  of  with  respect  to  P^.  In  the 

particular  case  that  the  propablllty  measures  P^ ,  P^  are  defined  In 

terms  of  densities  ir^(x),  ^^(x)  with  respect  to  y,  the  Integral 

representation  formula  reduces  to 

r  IT  (x) 

h(C:n)  “  ■n-^(x)  log  •  8x  (1. 1.8. 5) 

X 


where  the  Integration  Is  on  y -measure,  and 

ir,(x) 

a_  (x)  =■  '  /  . 

Kir\  iv_(x) 


(1. 1.8. 6) 


Obviously,  In  the  particular  case  that  X  Is  a  countable  space  of 


elements  x.^,  and  P^,  P^  are  given  as  (I.l.lJ.),  then  from  (1. 1.8. 2) 


It  follows 


Pp(Xi) 


(1.1.8. 7) 


and  from  (I. 1.8. 4)  It  follows 

and  the  Integral  representation  formula  reduces  to  (1. 1.1. 2)  with  n  =  “. 
Proof  of  theorem  1.3 

First  part  of  the  proof.  If  P^  Is  not  absolute  continuous  with  respect 
to  P^,  then  there  exists  a  set  B  e  S  such  that  P^(B)  >  0,  P^(B)  =  0. 
Considering  the  partition  Z  e  U  consisting  of  the  two  elements 
Zi  ■  B,  Z2  •  X-B,  It  follows  that 

-  ji 

Is  not  finite,  and  so  Is  also 


Second  part  of  the  proof.  Let  us  consider  that  is  absolutely  con¬ 
tinuous  with  respect  to  . 

In  what  follows,  let  P  be  some  probability  measure  on  the  real  line. 


such  that 


P(du)  =  1 


(1.1.8.11) 


If  <li(u)  is  some  convex  function  on  [0,®),  Jensen's  inequality  gives 


<|>(u)  P(du)  <  (|>(  u  P(du)) 


(1.1.8.12) 


<ti  (u)  =  -u  log  u 


(1.1.8.13) 


<{i"(u)  (0^u<®) 


(1.1.8.1A) 


so  ((i(u)  is  convex  on  (0  £  u  <  “),  and  from  (1.1.8.14)  it  follows  the 


inequality 


□D  OO  QD 

»  f  P 

(u  log  u)  P(du)  £  [  u  P(du)  ]  log  [  u  P(du) ](I.1.8.15) 


Let  T  e  S,  such  that  I’^(T)  ^0.  We  define  the  measure  P^  on  S  by  the 
relation 

P  (A)  -  P  {[x;  a  (x)  e  A]  /  X  e  T},  A  e  S  (1.1.8.16) 

i  "n  s 

where  the  bar  means  conditional  probability.  Let  f(u)  be  some  Borel 

measurable  function  on  [0,®).  Then 

00  00 

»  0 

f(u)  P^(du)  »  f  (u).  p  .  P^{  [x;  u  <  a^  _^(x)  <  u  +  du]  0  T}  = 

0  0 

1  r 

”  FItT  •  ^  (1.1.8.17) 


“•WJvVA'S. 


■V  w  A  .V.  - « n  A  A  ..V  .V  .V  - - -  ..W-. . .  . 


If  f(u)  “  1,  from  (1.1.8.17)  it  follows  that 


PxCdx)  “  p  (T)  •  ^ 

T1 


i.e.  the  measure  is  a  probability  measure. 
If  f(u)  =  u.  from  (1.1.8.17)  it  follows  that 


(1.1.8.18) 


u  Pj,(dx)  -  1^  .  I  P^(dx)  -  (1.1.8.19) 


P^(T) 


If  f(u)  =  u  log  u,  from  (1.1.8.17)  it  follows  that 


1  f 

u  log  u.  Px(dx)  =  -  P^(dx) 

0  ^  T  * 

=  P  -^x)  •  (I.l. 

fl  • 


(1.1.8.20) 


From  (I. 1.8. 5),  because  of  (1.1.8.19),  (1.1.8.20)  it  follows 

X  Pr(T)  Pp(T) 

P  (T)*  *  P^Cdx)  1  p  (T)  •  log  p  (x-) 

n  i,  n  n 


Pp(T) 

P^(T)  logy^ 


[log  a^ .^(x) ].  P^(dx) 


(1.1.8.21) 


Now,  let  be  elements  of  some  partition  Z  of  X.  With  T  =  Z^,  from 
(1.1.8.21)  it  follows 

P^(Zg)  log  <  J  [log  a^.^(x)].  P^(dx)  (1.1.8.22) 


and  consequently 


h(C_:n-)  =  I  Pr(Z  )  logfi^fs^ll  [log  a  (x)].  Pp(dx)= 
=  I  [log  a^,^(x)].  P^(dx)  (1.1.8 


(1.1.8.23) 


so  from  (I. 1.5. 7)  it  follows 


;:n)  <  [  [log  ap  „(x)]  .  P_l 


(1.1.8.24) 


Third  part  of  the  proof. 


Because 


Hm(x  log  x)  =  0,  lim  P_{x;  [log  a  (x) |  >  k}  =  0  (1.1.8.25) 


it  follows  that  we  mav  chose  c  >  n  emoi'i  i.  ^  o  __ 

large  that 

~  f  1  [log  a^,^(x)l  >  k}  log  P^{x;  | log  a  (x)  1  >  k}  £  0 


{x;  llog  a^.^(x)|  _<  k}  =  {x;  e  <  a^.^(x)  £  e^} 


Let  (1  £  3  £  n)  be  such  disjoint  sets  in  S,  that 
n 

a)  U  Z  *  {x;  [log  a  (x)l  £  k} 

s=l  ®  ^ 

b)  log  e^  -  log  i.  §■  (1  £  s  £  n) 


(1.1.8.26) 


(1.1.8.27) 


(1.1.8.28) 


(1.1.8.28’) 


where 


*  sup{a^.^(x);  x  c  Z^};  =  inf{a^.^(x);  x  £  Z^}  (1.1.8.29) 


Let  us  define  the  set 


^n+1  °  ^*5  log  a  (x)|  >  k) 


so  that 


U  ^  -  X 

S=1 


(1.1.8.30) 


(1.1.8.31) 


and  consequently  Z^  (1-  i  a  £  n+1)  form  a  partition  Z^^^  of  X. 
Obviously  for  any  s  such  that  1  <  s  <  n, 

“  I  P_(dx)  £e  .  P„(dx)  =  e^.P  (Z  )  (1.1.8.32) 

js*n  n  ”'Sjn  sns 


Pr(Z,)  -  f  a  (x)  P^(dx)  >  [  P^(dx)  =  m„.P  (Z_) 


(1.1.8.32') 


Similarly 


[log  a  (x)  ]  .P  (dx)  _< 

^  •  n  s 


[log  e^]P^(dx)  =  (log  eg).P5(Zg) 


[log  a^.^(x)].P^( 

Z 

s 


<9 

.P^(dx)  ^  I  [log  m^lP^Cdx)  =  (log 
Z 

s 

From  (1.1.8.32),  (1.1.8.32')  it  follows 

e 


m^).P^(Z^) 


”s  i  P 

n  s 


.(z  ) 

^  S 


n  a 

and  from  (1.1.8.33),  (1.1.8.33')  it  follows 

f 

[log  mg].P^(Zg)  _<  [log  a^  .^(x)]P^(dx)  £  [log  e^],P^(Z 
Z 

s 

From  (1.1.8.22),  (1.1.8.35)  it  follows 

f 

a^.^(x)].p^(dx)  <  [log  ej.p, 

ns  ^ 

s 

so  that  from  (1.1.8.35),  (1.1.8.36)  it  follows 

Pp(Zg)  log  p^— ^-y  -  [log  a^  .^(x)].P^(dx)  <  [log  e^l.P^ 
n  s 

S 


(log  e  -  log  m 

S  9 


s 

•I'c 


O  o  <9  ^ 

From  (1.1.8.34),  (1.1.8.36)  it  follows  similarlj 

(x) ] .P^ (dx)  ^  [Ic 


n  s  ^ 

< 


TTT-  I  ^:n 


S 


From  (I.l 


(log 

the 


.8.37) 

P.(Zs) 


(I.l 


.8.37 


')  it  follows 


p^(z  )  ■ 

^  ^  Z 


f, 

- J  ,, 

Z 


(x).P^(dx)| 


(1.1.8.33) 

(1.1.8.33') 

(1.1.8.34) 

(1.1.8.35) 

(1.1.8.36) 

-  [log  mJ.P^(Zg)  = 
(1.1.8.37) 

-  [log  m^] .P^(Zg) 
(1.1.8.37') 

.P  (Z  ) ;  (1  <  s  <  n) 
C  s  —  — 


(1.1.8.38) 


from  which  it  follows 


f 

I  P^(Z^)  log  «  - 

s=i  n  s 

log  a^ (x) .P^ (dx) 
n 

’’5 


U  z. 

S=1 


I  [P^  (Z^)  log  )  -  log  (X)  .P^  (dx)  ] 

=1  n  s  ^ 


n 

I 

S=1 


n  s 


log  (x) .P^ (dx) 


I  (log  e^-log  P  (Z^)  i  I  f.p  (zp  -  f.P(  U  Z,)  < 
S=1  S=1 


S=1 


So 


n  Pr(Z^) 

I  h  «s>  rjzj  - 


log  a  (x) .P  (dx) 
s  •  n  s 


£ 

^2 


X-  z 


n+1 


From  (1.1.8,26)  we  obtain 

~  2  -  ^5^^n+l^  ^C^^n+1^  -  ^5^^n+l^  ^n^^n+1) 

From  (1.1.8.39)  it  follows 
n  Pr(Z-) 


c  "  )  r 

Ti  I /eW,)  108  pVt- 

s=i  ns  ^ 

^  ^n+l 


.  (dx) 


From  (1.1.8.40),  (1.1.8.41)  by  addition,  it  follows 
n+1 

J _ s_ 

n  3 


n+1 

1  I  l°g  vTrT-\ 

si  ri  s  Y  ,— 


or 


VL+1 


Pr(Z.) 


Y  ?(Z)  log 


3.1  P,(Z3)  - 


log  a.  (x)P  (dx)  -  £ 
4  :n  ^ 


x-z 


'n+1 


or 


log  a^  -P^  ^ 


X-Z 


'n+l 


(1.1.8.39) 


(1.1.8.40) 


(1.1.8.41) 


(1.1.8.42) 


(1.1.8.43) 


from  which  it  follows 


h(5:n)  *  sup  hCS^JTi^)  1  h(C  (q) (0)^-  ^ 

ZeU  Z  Z' 


x;|log  a^.^(x)  I  _<  k} 


(I.l. 


and  if  k  e  ■>  0,  we  obtain  the  inequality 


h(5:n)  > 


log  a^  (x)  .P^  (dx) 


(I.l. 


From  (1.1.8.24),  (1.1.8.45)  it  follows  (I. 1.8.1). 

If  the  integral  (1. 1.8.1),  let  us  make  the  substitution 


or 


^5:r,“  "  “ 


*  ■  “5^“’ 


(I.l. 


(I.l. 


This  transforms  the  integrand  in 

u  log  u 


(I.l. 


and  the  measure  (dx)  is  transformed  in  some  measure  L(du).  So, 


(I. 1.8.1)  takes  the  form 


(u  log  u)L(du) 


(I.l. 


Let  us  denote 


Obviously 


f(u)  =  u  log  u. 


{u;  f(u)  <  0}  =  {u;  0  <  u  <  1} 


(I.l. 


(I.l. 


From 


—  -  1  +  log  u 


-1 


it  is  seen  that  f(u)  has  a  minimum  value  for  u  =  e  ,  and  so 

f(u)  >  f(e  ^)  =  -e  ^ 


(I.l. 


(I.l. 


so  that 


u  log  u.L(du) ^ 


-1  -1  -1 
-e  .L(du)*-e  .L([0,1])  >  -e  (I.l. 


8.44) 

8.45) 

8.46) 

8.46') 

8.47) 

8.48) 

8.49) 

8.50) 

8.51) 

8.52) 

8.52) 


from  which  it  follows  that  the  Integral  in  (I. 1.8.1)  converges. 
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1.2  Some  elementary  properties  of  relative  entropy 

1.2.1  Let  us  consider  that  the  random  vectors  C are  taking  values  In 
the  measurable  space  (X,S) .  Let  us  consider  also  another  measurable  space 
(X,S)  and  let  f(x)  be  a  S-measurable  function  defined  on  (X,S)  with 
values  In  (X,S).  The  compound  functions 

5(0))  -  f(5(a))),  n(o))  -  f(-(a)))  (1. 2. 1.1) 

are  random  vectors  with  values  In  (X,S). 

Let 

P^(T)  -  P{o);C(o))  e  T};  P  (T)  -  P{o);ri(o))  e  T}  ;  T  e  S  (1. 2. 1.2) 

P_(T)  =  P{o);r(o))  e  T}  ;  P_(T)  *  P{oj;n(o))  e  T} ;  T  e  S  (1. 2. 1.2') 

C  n 

Theorem  1.4  For  any  random  vectors  C»n.  and  any  function  f, 

h(?:n)  <  h(5:n)  (1. 2. 1.3) 

with  equality  If  f  ^  exists  a.e.(P„  +  P  ). 

- a - i ^  n 

Proof.  Let  T  e  X,  so  that 

T  =■  f"^(T)  =  {x  e  X;  f(x)  e  T}  (1. 2. 1.4) 

Because  f(x)  Is  S-measurable,  It  follows  that  If  T  £  S,  then 
T  »  f"^(T)  E  S  and 

P  (T)  =  P  {x;x  E  T}  =  P_  {x;  X  E  T)  =  P_{T)  (I. 2. 1.5) 

^  ^  5  5 

P  (T)  =  P  {x;x  E  T}  -  P  {x;  X  e  T}  =  P  (T)  (I. 2. 1.5’) 

n  n 

From  the  relation 

Ic  Ic 

f“^(  u  T.)  =  U  f~^(T  )  (I. 2. 1.6) 

1-1  ^  1=1  ^ 

If  T^  E  ?  (1  _<  1  £  k)  It  follows  that  T^  =  f“^(T^)  (1  £  1  £  k)  ;  also  If 
n  _  _  n 

Ij  T,  -  X,  It  follows  (J  T,  =  X.  Also  from 
1=1  ^  1=1 

=  f"^(T^)  -  f'^(T2) 

It  follows  that  If  Tj^n  ^2  =  0,  then  T^^  fl  T2  ®  0. 


(I. 2. 1.7) 


From  the  above  it  follows  that  if  Z  *  {Z^}  is  a  partition  of  X,  then 

Z  »  f  ^(Z)  “  {Z  }  with  Z  =  f  ^(Z  )  is  a  partition  in  X. 
s  s  s 

Let  us  denote  the  totality  of  such  partitions  Z  =  f  ^(Z)  if  X  by  R, 

all  partitions  of  X  by  U_,  all  partitions  of  X  by  U  ,  so  that  R C  Up 

^  S  ^ 


and  Ug  -  R  (J(Ug  -  R). 

From  (I. 2. 1.5),  (I. 2. 1.5')  it  follows 

n  n 

ha_:  n_)  =  I  P_(Z  )  log  ^  _  -  =  I 

Z  Z  s=l  5  )  s=l 

n  s 


P  (Z  ) 
n  s 

(I. 2. 1.8) 


So 


h(5:n) 


and 


sup  h(C=:n=)  =  sup  h(C  :  n„) 
ZeU_  ^  ^  ZeR  ^  ^ 

S 


(I. 2. 1.9) 


h(C  :p) 


sup 

Z  e  Ug 


h(C_:ri„)  =  max{sup  h(C_:Ti»); 
^  ^  ZeR  ^  ^ 


sup 

ZeUg-R 


h(?z:nz)} 


=  max{£up  h(5_;Ti_);  sup  h(C„:n„)}  “ max{h(C  :n) ;  sup  h(C„:ri7)}>h(?  :n) 
ZeU_  Z  Z  ZeU.-R  ^  ZeU_-R 

s  ^  ® 

(1.2.1.10) 

which  proves  (I. 2. 1.3).  In  the  case  when  the  inverse  function  f  ^ 
exists  P^  '*’^11’  a.e.,  we  can  change  the  roles  of  5,n  with  5>n,  obtaining 
the  inequality  inverse  to  (I. 2. 1.3)  so  these  both  together  give  us  the 
equality. 

Obviously  the  above  result  remains  true  in  the  particular  case  when 
(X,S)  is  identical  with  (X,S);  of  particular  interest  is  the  case  when 
f  is  a  linear  function,  and  let 


C  =  f(C)  =  A€,  n  =  fCn)  =  An. 


Theorem  1.4'.  h(AC :An)  <_  h(C  :n)  (1.2.1.11) 

with  equality  if  A  is  not  singular. 


This  theorem  is  following  from  Theorem  4,  but  it  can  be  obtained  also 
from  the  fact  that  the  Integral  representation  (I. 1.8.1)  of  h(C:n) 
does  not  depend  on  the  system  of  coordinates  in  the  vector  space  X. 
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Theorem  1.5  If  converges  In  distribution  to  C  and  ^  n,  then 

W:")  isS  (1. 2.2.2) 

Proof.  From  Definition  I.l  it  follows  that 

1°  if  h(C:n)  <  for  any  e  >  0  it  exists  a  partition 

Z  =  {Z  }  e  U,  such  that 
s 

>  h(C:Ti)  -  e  (1. 2. 2. 3) 

2°  If  h(C:n)  “  for  any  N  >  0  it  exists  a  partition 

Z  =•  {Z  }  e  U,  such  that 
s 

>  N  (I. 2. 2. 3’) 

Becasue  Z  e  U,  it  follows  that  for  any  Z  belonging  to  Z,  in  both 

s 

cases  1°,  2° 

lim  P-  (Z)  -  Pp(Z_);  lim  P„  (ZJ  -  P_(Z  ) (1  <  s  <  m)  (I. 2. 2. 4) 


m  h 

P  (Z  ) 

1=1  n  s 

h(E,:V  I  ^(V  1»*?V)- 

i»l  n  s 

“  ^  [11.  (Z^)1 

1*"  »«nz=v>  -  "e  <V1  i°*[fST-V)T 

!>♦<*>  1=1  n  n  ns 

n 

Y\)  log 


(1.2.2. 5  ) 


(I.2.2.;5’) 


(1.2.2. 6  ^ 


Obviously, 


n  n  .  .  „7(0)  nZ 

z(0)^Ug  nZ 


(1.2.2. 7  ) 


From  (1.2.2. 6),  (I. 2. 2. 7)  It  follows 


11m  h(Cj^:nj^)  2  ^^^nZ'^nZ^  ““ 

!»♦=>  I^X” 


(1.2.2. 8  ) 


11m  h(Cj^:nj^)  2  h(525n2) 


(1.2.2. 8') 


1"  From  (I. 2. 2. 3),  (I. 2. 2. 8)  it  follows 


11m  h(5  :n  )  >  h(C:n)  -  e 

-  n  n  — 

rrH» 


(1.2.2. 9) 


for  any  e  >  0. 

2°  From  (I. 2. 2. 3'),  (I. 2. 2. 8')  it  follows 


11m  h(C  :n  )  >  N 

-  n  n 

n*^ 


(I. 2. 2. 9') 


for  any  N  >  0. 

From  (I. 2. 2. 9)  it  follows  (I. 2. 2. 2)  in  the  case  h(5:n)  <  »  and  from 


(I. 2. 2. 9')  it  follows 


11m  h(C  :n  )  “  “ 

-  n  n 

n>“ 


(1.2.2.10) 


in  the  case  h(C:n)  “  i.e.  (1.2.2.  2)  So  the  theorem  is  proved. 


1.2.3.  Let  us  consider  two  sequences  of  random  vectors  5  ,n  taking 

n  n 

values  in  the  measurable  spaces  (X^,S^)  (n  >  1,2,...)  so  that  the 

reindom  vectors  5  =  (C  -  , . . .  ,C  )  ,  n  ■  (n , , . . .  ,n  )  are  taking 

in  in 

values  in  the  measurable  spaces 

(jj(n)^g(n)^  _  X  (X  ,S.)  (n  =  l,2,...) 
i=l 

with  elements  e  X^”\  and  the  random  vectors  5  =  (5^^,...), 

h  *  (hj^,...)  are  taking  values  in  the  measurable  space 

(x,s)  =  X  (x.,sp 
i=l  ^ 

with  elements  x  e  X.  Obviously  *  (Xj^,...,x^)  e  X^^\  x  *  (x^^,...)  e  X 

Theorem  1.6 

lim  -  h(C:n)  (1. 2. 3.1) 

n** 

Proof.  For  any  =  ^^1*  *  *  * ’’^n’^n+l^  ^  it  corresponds  an 

element  x^”^  =  (Xj^,...,x^)  e  X^*'^  Let  us  denote  by  this  correspondence,  i.e 

F  -  x^“^  (1. 2. 3. 2) 

n 

is  a  function  with  domain  and  range  X^”^ .  Consequently  for 

-  ^^l’’*’’^n’^n+l^  takes  place  the  relation 

F  (1. 2. 3. 3) 

n 

From  (1. 2. 1.3),  (1. 2. 3. 3)  it  follows  that 

(n  -  1,2,...)  (1. 2. 3. 4) 

So  that  the  sequence  h(5 )  is  not  decreasing  and  consequently 

lim  h(5^"^:n^”^)  (I. 2. 3. 5) 

n-Hn 


does  exists. 


Let  us  denote 


1.  M  -  the  totality  of  all  sets  of  the  form 

XI  OQ  OD 

T  -  X  Z  X  X  X  e  S  -  X  S.  (1. 2. 3. 6) 

1=1  j-n+1  ^  1=1  ^ 

where  r.  S^,  X^  (1  _<  1  £  n)  .  (n  =  1,2, . . .) 

2.  L  -  the  algebra  of  all  finite  sums  of  sets  belonging  to  M. 

3.  r(T)  -  n  for  T  In  (I. 2. 3. 6) 

4.  -  the  totality  of  sets  T  e  S  of  the  form  (I. 2. 3. 6)  with 
given  r(T)  =  n. 

5.  -  the  algebra  of  all  finite  sums  of  sets  belonging  to  M^. 

6.  ~  totality  of  partitions  V  of  X  with  elements  In  L. 

7.  -  the  totality  of  partitions  V  of  X  with  elements  In  L^. 
n 

8.  -  the  totality  of  partitions  V  of  X  with  elements  In  M. 

M 

9.  -  the  totality  of  partitions  V  of  X  with  elements  In 
n 

10.  -  the  totality  of  sets  of  the  form 

t‘“>  -  X  z,  £  s'“>  .  X  s,  (1. 2. 3. 6’) 

1*1  ^  1-1  ^ 

11.  -  the  algebra  of  all  finite  sums  of  sets  belonging  to  . 

12.  U  -  the  totality  of  partitions  of  with  elements  In 

Li 

13.  U  -  the  totality  of  partitions  of  with  elements  In 

M 

00 

14.  R  =  U  U„  .  (1. 2. 3. 7) 

n=l  \ 

It  Is  obvious  that 

1®.  L  generates  the  o-algebra  S. 

2®.  Any  partition  V  e  has  a  subpartition  Vq  e 

3®.  Any  partition  V  e  Uj^  has  a  subpartition  e  for  some  value 

n 

of  n,  because  the  number  of  elements  In  V  Is  finite,  l.e.,  any 
partition  V  e  has  a  subpartition  Vq  e  R. 
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Consequently,  from  2°  and  3°  it  follows  that  any  partition  V  e  has 
a  subpartition  Vq  e  R  and  from  Theorem  1.2  it  follows 

h(C:n)  ■  sup  h(C  .ri„)  =■  sup  sup  h(C  (1. 2. 3. 8) 

VeR  l<n<«  VeU„ 

4®.  generates  the  a-algebra 

5®.  Any  partition  e  U  has  a  subpartition  e  U  so  that 

L  M 


from  Theorem  1,2 

h(5  :ti  )  -  sup  h(C  .  .  :n  .  .) 
v(^)eU  ,  . 
a<“> 


(1. 2. 3. 9) 


Let  A  be  the  one-to-one  transformation  established  by  (I. 2. 3. 6), 
n 

(I. 2. 3. 6')  between  M  and  ,  i.e. 


A  (T)  -  T  c  M.  e 

n 


(1.2.3.10) 


Obviously,  A^  is  measure  preserving  in  the  sense  that 

P  -  P^d),  P 

5  n 


(1.2.3.11) 


We  can  define  a  one-to-one  correspondence  B  between  U,,  and  U  .  ,  ,  so  that 

"  ‘n  M' 


n  n 


(1.2.3.12) 


where  V  -  {V  },  e  (1  <  s  £  e),  e 

s  s  n  s  s 

OD 

(1  <  s  <  e),  V  -  X  X  X.;  i.e.  A  (V  )  -  V^"^(l  <  s  <  e) 

as  i  n  3  s  -  - 

Consequently  for  V  c  B^(V)  e  U 

n  M 

•  -5  ./''s’ 


h(Cy:Ti^)  *  I  Pf  d  )  log  ^ 

3-1  n..  3 


^,(n) 

I  p  (  V  (Vg^^)  log  — ^ 

<(i)  ^(n)  > 


.fr(n)  (n)  . 

'‘%(n)-\(nP 


(1.2.3.13) 


^  /w  V 


*•  -*•  -**‘^**  **•.“*«  fc  •*"  •  b“"  V. 


li  (T)  «  sup  2  y(T  D  Z  )  .  ii(T  I  )  =  y(T  n  X  )  (1. 2. 4. 2) 

Z+  e  Ug+  s  ®  ® 


y~(T)  -  sup  I  P(T  0  Zj  =  -y  (T  nljzj  =  -y  (T  n  X")  (1, 2. 4. 3) 

Z-  e  Ug_  s  ®  T  ® 


|y|(T)*  sup  j;|y(TrtZ)|=  sup  ^ydOz"^)- 
Z  e  Ug  s  ®  Z'^  e  Ug+  s  ® 

-  sup  I  (Tpl  Zj  =  y‘*‘(T)  +  y"(T)  (1. 2. 4. 4.) 

Z-cUg.  s 

The  set  functions  y^,  y  ,  lu]  are  named  positive  (or  upper)  variation 
of  y .  negative  (or  lower)  variation  of  y ,  and  total  variation  of  y . 

Each  of  them  is  a  measure  and 

y(T)  =  y'^d)  -  y~(T)  (1. 2. 4. 5) 


If  y  is  finite  or  a-finite,  so  are  his  variations  (See  Theorem  B,  §29, 
p.  123 [4  ]).  In  what  follows,  we  will  denote 


lUM  =  IhUx)  (1. 2. 4. 5’) 

Theorem  1.7.  If  U  are  measures  on  (X,S) ,  it  exists  a  set  Xq  e  S 
with  y2(^Q)  ®  and  a  non-negative  S-measurable  function  a(x),  such 
that  for  any  T  e  S 


» 

la(x)  -  lly2(dx)  +  y^(T  0  Xq) 
T 


(I. 2. 4. 6) 


If  y^  is  absolutely  continuous  with  respect  to  y2  on  X,  (which  fact  we 
will  denote  in  the  future  by  y^^  «  y2)»  then  Xq  =  (8  and 


a(x) 


Uj^(dx) 


(I. 2. 4. 7) 


so  that 


[y,  -  y~| (T)  =  [  [a(x)  -  ll  y(dx) 


(I. 2. 4. 8) 


a(x) 


(I. 2. 4. 7') 


and 


,  are  the  corresponding  dei 

irj^(x) 

*  r^Cx) 


lui  -  U2I (T) 


it2(x)1  v(dx) 


(I. 2. 4. 8') 


Proof .  It  is  known  that  for  any  arbitrary  measures  ^  ”*^1  ~^2 

is  a  signed  measure.  It  is  also  known  that  it  exists  a  set  Xq  e  S 
such  that  <<  U2  X  -  Xq  and 


“l«0> 


a(x)  U2(dx) 


To 

where  Tq C  X  -  Xq  and  a(x)  is  an  S-measurable  function. 
Consequently,  for  Tq(3  X  -  Xq 


(^l  -  U2)(Tq)  -  -  U2^'^0^ 


[a(x)  -  l]ij2(dx) 


T 


0 


(I. 2. 4. 9) 


(1.2.4.10) 


Because  the  total  variation  of  -  U2  is  a  measure  on  S,  it  follows 
that  for  T  e  S 


|uj_  -  U2I  (T)  -  |u^  -  U2I  [T  Hex  -  Xq)]  +  |u^-U2l  (T  DXq) 

(1.2.4.11) 

Now  we  will  calculate  the  two  elements  in  this  sum. 

The  first  element  is 

ly^-U2l  [TfKX-XQ)]  -  (^1-^2^  (Tn(X-XQ)]  +  (u^-y2)“  [tPKX-Xq)]- 

(u^-u2)(Tn(x-XQ)nx*']-(uj^-u2)tTn(x  -  XQ)nx“]  - 


tPKx-  XQ)nx 


[a(x)  -  llu,(dx)  - 

.+  ^  I 


Tn(x-XQ)nx 


Tn(X-XQ)P|  x"^ 


|a(x)  -  l|y2(dx)  + 


[a(x)  -  l]u.,(dx) 
I a(x)  -  l|y2(dx) 


TfKX-  XQ)nx' 


a(x)  -  l|u2(dx) 


T  n(X  -  Xq) 


f 


TfKx-XQ) 


|a(x)  -r  l|y2(dx)  +  j  |a(x)-l|u2 


TflX,, 


! 


a(x)  -  l|u_(dx) 


(1.2.4.12) 


(dx)  « 


The  second  element  is 


IU1-U2I  (tO  Xq)  =  (vi^-y2)'^(TnXQ)  -(p^-P2)‘(TnXQ)  = 

(lij^-y2)[(T  n  XQ)n  X'^] +  (ii^-y2)[(T  0X^)0  x”]  = 

Pj^[(T  OXq)  n  X'^]  +  y^[(Tn  XQ)n  x"]  ^p^ECtPIXq)  U(x'^gx')] 
XQ)n  X]  =  v^(T  nxQ)=  d. 


(1.2.4.13) 


From  (I.2.4.7')»  (I. 2. 4. 8)  it  follows  (I. 2. 4. 8').  So  our  theorem  is  proved 
Let  be  given.  If  <<  ,  from  Theorem  1.3  formula  (1. 1.8.1) 


it  follows 


h(C:ri)  =  j  a^,^(x)  log  a^  (x)  .P^  (dx) 


(1.2.4.14) 


and  from  Theorem  1.7,  formula  (1. 2. 4. 8)  and  (1. 2. 4. 5')  it  follows 


“I  (1.2.4.15) 


where 


Theorem  1.8. 


P^ (dx) 
°  P^(dx) 


(1.2.4.16) 


a)  For  arbitrary  random  vectors  C ,n ,  takes  place  the  inequality 


£2.h(5:n) 


(1.2.4.17) 


b)  For  arbitrary  small  5  >  0,  there  exists  random  vectors  5 ,h  such  that 


>(2-6)  h(C:n) 


(1.2.4.17’) 


so  in  (1.2.4.17)  the  constant  2  can  not  be  replaced  by  a  smaller  one. 

Proof .  In  the  case  when  P^  is  not  absolute  continuous  with  respect  to 

P  ,  the  second  member  in  (1.2.4.17)  is  not  finite,  so  (1.2.4.17)  is 

n 

trivially  true,  so  it  remains  to  be  proved  only  in  the  case  when 


P,  <<  P  .  In  this  case 

C  n 


(1.2.4.18) 


a)  Let  us  consider  the  function 


i)(z)  =  z  log  z-z  +  1  (0<_z<«)  (1.2.4.19) 

Because  t|)'(z)  =  log  z,  ij;''(z)  =  it  is  easy  to  see  that  for  z  =  1 
this  function  has  a  minimum  \j/(l)  =  0,  and  is  convex,  non-negative,  so  that 


il)(z)^0  (0  <  z  <  »)  (1.2.4.19') 

Let  us  consider  the  expression 

(j)  (z)  =  -1(2  +  z)  ij)  (z)  -  (z  -  1)^  z-z  +  1)  ^  -  (z-1)^ 

(1.2.4.20) 

Because 


<{.(1)  =0 

(j) '  (z)  =  -|(z  log  z-z  +  1)  +  j(2  +  z)  log  z  -  2(z  -  1) 

<0'(1)  =  0 

$"(2)  =  ^  (z  log  z-z  +  1)  = 

it  follows  that  the  function  <{i  (z)  has  a  minimum  for  z  = 
and  from  (1.2.4.19')  it  follows  from  (1.2.4.24)  that 

4( ' '  (z)  ^0  (z  ^  0) 

i.e.  (()(z)  is  convex,  and  consequently 

({i  (z)  ^0  (z  ^  0) 


(1.2.4.21) 

(1.2.4.22) 

(1.2.4.23) 

(1.2.4.24) 

1  and  (J)  (1)  =  0, 

(1.2.4.25) 

(1.2.4.25') 


i.e. 

(z-1)^  £  y  (2  +  z)(z  log  z-z  +  1)  (z  ^  0) 
or 

(z-1)^  £ -|  (2  +  z)  4)(z)  (z  ^  0) 
or 

!  2  -  1!  ^  (-|  +  ^  z)  ’l^^Cz)  (z  ^  0) 


(1.2.4.26) 


(1.2.4.27) 


(1.2.4.28) 


Replacing  in  (1.2.4.28)  z  -./ith  a  (x) ,  we  obtain 

5 


a^r.Cx)-!!  1(3  +  3  a^:^U))  (a^  ^ 


(x)) 


(1.2.4.29) 


from  which  it  follows 


[a  (x)-l|.P  (dx)  <  ~  a  (x)).P  (dx)  (1.2.4.30) 

5:ri  h  —  ti-ri  n 

X  X 

From  Cauchy-Schwartz ’ s  inequality,  we  obtain 

llP^-PjI^  =  (  I  la^./x)-ll.P^(dx))2  < 

X 

<  (  (4  4 

—  J33C:ri  C-n  n 

X 

•  AO 

<  {or  +  -:r  a  (x)).P  (dx)  .  ii»  (a  (x)).P  (dx)  = 

—  J335:ri  n  5:n  n 

X  X 

=  [  |.  j  P^(dx)  +f  I  a^^^(x).P^(dx)]  • 

X  X 

•  f  t 

•  [  a^.^(x)  log  a^ ,^(x) .P^(dx)  -  a^ (x) .P^ (dx)  +  P^(dx)]= 

X  XX 

=  (-j  +  4^  [  a^,^(x)  log  a^  .^(x)  .P^(dx)  -  1  +  1]  = 

X 

=  2  I  a.  (x)  log  a  (x).P  (dx)  =  2  h(C:n)  (1.2.4.31) 

J  5:n  5:n  n 

X 

which  proves  (1.2.4.17). 

b)  Let  C  be  a  random  variable,  such  that  it  exists  a  set  e  S  with 

P^(Zo)  =  P^(X  -  Zq^  "  I  (1.2.4.32) 

Because  for  any  Z  e  S 

■ 

P^(Z)  =  l.P^(dx)  (1.2.4.33) 

Z 

it  follows 

Tr^(x)  =  1  (x  e  X)  (1.2.4.34) 

Let  6  be  an  arbitrary  number  (0  <  6  <  1) ,  and  let  us  define  the  function 

TT^(x)  =  1-6  ^  ^  ^0  (1.2.4.35) 

TT  (x)  =  1  +  6  xeXZ- 
n  0 


(1.2.4.36) 
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Because  >  0  (x  e  X)  and 


iT^(x)  P^(dx)  *  1 


(1.2.4.37) 


it  follows  that  tt  (x)  Is  the  probability  density  of  some  random 


variable  with  respect  to  the  measure  . 


Consequently 
h(C:ri)  = 


irg(x) 


log  P^(dx)  + 


IM 


x-z. 


1  .  1^1,  1  1  1  1 

“  2  1=?  2  “  2  77 

1-0 

Let  us  consider  the  function 


g(2)  =  log  (1  -  Z) 


-1 


(1.2.3.38) 


(1.2.4.39) 


It  Is  easy  to  calculate  that 

,n 

n 


^  =  (n  -  1)  :  (1  -  z)‘"  (n  >  1) 
dz 


So  that 


g(0)  -  0,  g^"^(0)  >  (n  -  1)  .'  (n  >  1) 


(1.2.4.40) 


(1.2.4.41) 


and  consequently, 


g(z)  -  I 


k-1 


k 


(1.2.4.42) 


a  convergent  series  for  jz]  <1;  because  6  <  6  <  1,  and  from 

(1.2.4.38)  we  know  that 

h(5:ri)  =  j  g(6^)  (1.2.4.43) 


and  from  (1.2.4.42)  It  follows 


.2k 


l.e. 


h(C:n)  -y  I  V 

k»l 


00  2k. 

2  h(C:n)  -  6^  +  I  V 

k-2 


(1.2.4.44) 


(1.2.4.45) 


Also  from  (1. 2. 4. 5'),  (1. 2. 4. 8’)  It  follows  that 


|ir^(x)  -’Tj^(x)  I  .P^(dx) 


1  -  (1  -6)1  .P^(dx)  + 


1-  (1  +  6)  .1 


X-Z, 


1  +  i 

2  2 


(dx) 


6 


(1.2.4.46) 
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From  (1.2.4.45),  (1.2.4.47)  it  follows  that 

.  «  21c 


(1.2.4.47) 


So,  from  (1.2.4.46)  we  obtain 


12“  2 

(2-6)jU  +  I  V^-6  - 


(2-6)  h(c:n)  -  llPg-P^II  “ 

CO  2k  3 
*(!-—)  V  - ^ 


If  6  Is  sufficient  small 

00  21c  3 

<1  -f>  I  X 

k-2 

so  that  from  (1.2.4.48)  it  follows 
(2-6)  h(S:Ti)  -||P^  -Pjl^  <  ^  ^  ^  <  0 


(1.2.4.48) 


(1.2.4.49) 


4  2 


(1.2.4.50) 


Consequently  for  5 .n  as  defined  above,  (1.2.4.17)  is  satisfied. 

1.2.5.  Theorem  1.9. 

a)  If  C  ,ri  are  given  random  vectors  and  p  =  p«  (0<p^<  1)  satisfies 


the  relation 


h(5  :ti)  +  h(C  :n)  »  2  p  log 


b)  Between  all  pairs  of  vectors  with  the  same  value  of 


h(5  :ri)  +  h(5:n) 


It  exists  a  pair  such  that 


P.  -Pi 
^0  ^0 


so  that  the  relation  (I. 2. 5. 2)  cannot  be  Improved. 


(1.2. 5.1) 


(1. 2. 5. 2) 


(I. 2. 5.1’) 


(I. 2. 5. 2’) 


Proof . 


a^)  It  is  easy  to  see  that 


F(p)  -  2p  log  ^  (0  p  <  1) 


(I. 2. 5. 3) 
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has  the  derivative 

F’(p)  -  -^+  2  log  (1. 2. 5. 3') 

1-p 

Because  F(0)  ■  0,  F'(p)  ^  0  (0  ^  p  <  1) ,  it  follows  that  F(p)  in  the 
Interval  0  ^  p  <  1  is  monotonously  increasing  and  has  range  [0,  +“>). 

More  than  that,  it  can  be  written  as 

®  2m 

F(p)  -  4.  ^  (1. 2. 5. 4) 

m*l 

from  where  it  is  also  seen  that  it  is  increasing.  Consequently,  for  any 
value  of  the  constant  J,  the  equation 

2p  log  =  J  (0  £  J  <  “)  (1. 2. 5. 5) 

has  exactly  one  solution. 

p  =  p(J),  0  £  p(J)  <  1  (I. 2. 5. 5') 

Given  the  random  vectors  C,  n,  we  define  the  value  of  Jq  by 

Jq  ■"  h(C:n)  +  h(5:n)  (1. 2. 5. 6) 


Denoting  by 


Pq  -  P(V 


(I. 2. 5. 6') 


the  solution  of  the  equation  (1. 2. 5. 5)  with  J  ■«  J^,  let  us  denote 
2 


2p, 


1-P 


Jo+2a 

2P„ 


,  2Po 


1-P«  2 

0  I-Pq 


(1. 2. 5. 7) 


We  will  prove  that  the  inequality 


T  Iz-lj  £  (z-1)  log  z  +  a  (z+1) 


(I. 2. 5. 8) 


is  satisfied  for  all  z  ^  0. 

If  (1. 2. 5. 8)  is  satisfied  for  a  value  of  z,  it  is  satisfied  also  for  j,  because 

T  I-  -  l|  <  (-  -  1)  log  i  +  o(i  +  1)  (1. 2. 5. 8') 

Z  z  z  z 

is  another  manner  of  writing  (I. 2. 5. 8). 
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1 

2 


Because  the  function  (j)(z)  has  a  minimum  zero  for  z  given  by  (1.2.5.14), 


because 


({i  (+0)  =  ({i(+»)  =  +« 


(1.2.5.15) 


and  because  from  (1.2.5.11”)  it  is  seen  that  i|)(z)  is  convex  for  z  ^  0, 
it  follows  that  (j)(z)  ^  0  (z  ^  0),  i.e.  the  inequality  (1.2. 5. 8”)  is 
satisfied,  i.e.  the  inequality  (1.2. 5. 8)  is  satisfied, 
a^)  Let  us  replace  in  (1.2. 5. 8)  z  with  a^,^(x),  so  we  obtain 

log  +  o  (a  (x)  +  1)  (1.2.5.16) 

•n  s  *0  ^  •n  s  -n 

and  by  integration  on  over  all  X, 

I  ■  ll'P^Cdx)  £  I  (a  (x)  -  1)  log  a  (x).P  (dx)  + 

j  ^  •n  n  j  4  'h  ^  .n  n 


(x)  +  1) .P  (dx) 
ri 


(1.2.5.17) 


Now  it  is  easy  to  calculate 


j  (a^.^(x)  -  1)  log  =  I  " 

f  f  ’'n 


:n)  +  j  log  a.^.,^(x).P^(dx)  •  h(l:ri)  +  hCo:?)  «  (1.2.5. 


(a^.^Cx)  +  1)  P^(dx)  -  I  aj,„ 


(x)  P  (dx)  +  P  (dx) 
n  In 


Pr (dx) 


J  P(d^  ^  “  J  ^  =  2 

X  ^  X 


From  (1.2.4.15)  we  have 


'“sm'*’  ■  ''a'*'’  -  ll''5  - 


(1.2.5.19) 


(1.2.5.20) 


4  .•'a**  -*  •*  .•  •“  ^  ■  V*  V  “V  /  'V 
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With  the  results  (I.2..S.18),  (1.2.5.19),  (1.2.5.20), 

the  inequality 

(1.2.5.17)  can  be  written  as 

T  llP^  -  Pjl  I-Jq  ^  ^ 

(1.2.5.21) 

and  because  of  (1.2. 5. 7) 

J-+2a 

IIP5-PJI1  ,  -PPo 

(1.2.5.21') 

i.e.  (I. 2. 5. 2). 


b)  Let  us  consider  two  random  variables  both  taking  the  values 

x^,X2  only,  with  probability  given  by 


(1.2.5.22) 


Then 


(x^)  (X2) 

^^^0=^0^  *  ^5/^1^  P'TiT)  ■"  P  °(X2) 

TIq 


riQ  1 


1-fp^ 

»o  T^ 


(1.2.5.23) 


SO  that 


1-+Pj 

*^0  xi^ 


(1.2.5.24) 


0 


which  proves  the  last  statement  of  the  theorem. 


1.2.6.  Theorem  I. 10. 

If  5 ,n  are  two  random  vectors  and  e  >  0  arbitrary,  then 

P^{|i^.^(x)|  >  e}  1  (1. 2. 6.1) 

with  i^.^(x)  given  by  (I. 1.8. 4). 

Proof .  Let 

A*  {x;  1  log  a  (x) I  *  | i  (x) |  >  e}  (1. 2. 6. 2) 

^  .r)  s  •n 

-  {x;  log  a  (x)  >  e};  A2  -  (x;  log  a^.^(x)  <  -e) 

’  (1.2. 6. 3) 

So 

A«A^UA2,  Aj^nA2“0  (1.2. 6. 4) 


>>  . 
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Because 


it  follows  that 


e  >  log  (1  +  e) 


(1.2. 6.5) 


-  {x;  log  a  (x)  >  e}C  {x;  log  a  (x)  >  log  (1+e)} 

•1*  s  ‘H  s  •n 

{*;  >  1  +  a)  -  (x;  — - 

^  ■  a^.^(x)  ^  ^  1  - 


Because  (I. 2. 6. 5)  can  be  written  as 


e  >  1  +  e 


(I. 2. 6. 6) 


-e  <  - 


(I. 2. 6. 7) 


(I. 2. 6. 7') 


It  follows  that 


A,  -  {x;  log  a  (x)  <  -a}  -  {x;  e  <  log  - — 


>  e^)  C  {x;  - —7-0  >  1  +  a) 


a.  (x) 


From  (1. 2. 6. 4),  (1. 2. 6. 6),  (1. 2. 6. 8)  it  follows  that 


A  -  A^U  A^  C  (x;  1  -  ^  j  U  (x;  1  -  ^  ^  <  -  } 

?:n  5:h 


(I. 2. 6. 8) 


From  (I. 2. 6. 2),  (I. 2. 6. 9)  it  follows 


(I. 2. 6. 9) 


P^{x;  li^.^(x)|  >  e}  <  P^(K) 


(1.2.6.10) 


N  -  {x;  a^  (x)  =0),  X  -  N  =  {x;  a_  (x)  ^  O)  (1.2.6.11) 


so  that 


P^(N) 


1^: 


^(x)  P^(dx)  -  0 


(1.2.6.12) 


•  *' 

V 

.  . 


^  V*'  -  **  '  ’  • 


*•  .**  •  *»  «'•  k  •  *  '  ■ 


Let  us  denote 


-  K  n  (X-N) 


(1.2. 


so  that 

K  -  (kH  tJ)U  \ 

From  (1.2.6.12),  (1.2.6.13),  (1.2.6.14)  it  follows  that 

P^(Kn  N)  -  0 

P^(K)  =  P^(KnN)  +  P^(Kj^)  = 
because  from  (1.2.6.12)  and  KC  N  it  follows 

P^(K)  £  P(N)  =  0 

Taking  into  consideration  (1.2.6.10),  (1.2.6.15')  it  follows 


(1.2. 

(1.2. 

(1.2. 

(1.2, 


I  a  (x)  -  li  .P  (dx)  " 
4  111 


P. (dx) 


!i  - 


K, 


e 

l-K  ' 


'1  ‘'1  ‘1 


i.e.  (1.2. 6.1). 


6.13) 

6.14) 

6.15) 

6.15') 

6.12') 

P^(dx)- 

6.16) 
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1.3.  Additivity  theorems 


1.3.1.  Let  be  a  probability  space,  where  is  a  set  of 

elements  a  a-algebra  of  subsets  of  -  a  probability  measure 

on  (i  *  1,2) 

We  consider  the  random  vectors  5^.  (i  *  1»2),  defined  on  these 
probability  spaces,  with  values  in  the  measurable  spaces  (X^,S^),  where 
is  a  set  of  elements  x^,  a  o-algebra  of  subsets  of  X^. 

Let  (n,E)  =  X  (^2*22)  be  a  measurable  space,  where  is  the  set 

of  elements  oi  =  ((»i^,a)2)»  10 e  £ij^,  e  £^2»  ^  =  E^^  x  E2. 

Let  P  be  a  probability  measure  on  E  with  marginals  P^^  on  E^  (i  =  1,2). 
So  (Q,E,P)  is  a  probability  space. 

We  consider  the  random  vectors 


(I. 3. 1.1) 


defined  on  the  probability  space  (n,E,P)  with  values  in  the  measurable 
space  (X,S)  »  (Xj^,Sj^)  x  (X2,S2),  where  X  is  the  set  of  elements 
X  =  (Xj^,X2),  e  X^,  X2  e  X2  and  S  =  x  S2* 

Let 


i^  i’  'i' 


‘■i 


e  (i  *  1,2) 

(I. 3. 1.2) 

e  (i  »  1,2) 

(I. 3. 1.3) 

.  random  vectors  (i  * 

1,2)  and  let 

((4)2))  e  T},  T  e  S 

(I. 3. 1.4) 

((1)2))  E  T},  T  e  S 

(I. 3. 1.5) 

be  their  joint  probability  measures. 


Lemma  1.5. 


If  P_  _  is  absolutely  continuous  with  respect  to  P  ,  then 

^1  ^2  1  2 


a)  P.  is  absolutely  continuous  with  respect  to  P  ,  P-  is  absolutely 
^1  ^1  ^2 


continuous  with  respect  to  P  ,  and 


b) 


P,  I.  (.jx,)  is  absolutely  continuous  with  respect  to  P  |  (.lx.) 

^2'^1  ^2  ^1 


(P  -  a.e) . 


c)  Moreover,  if  the  Radon-Nicodym  derivative  of  P  .  with  respect 

^l’^2 

to  P  _  is 


^1*^2 


(x- ,x,)  = 


P  (dx  dx  ) 


(C.C,)  :  (n.n,)"  1’  2^  P  (dx.dx  ) 
12  12  ^l’^2  ^  ^ 


(1.3.1  6) 


then  the  Radon-Nicodym  derivative  of  P  with  respect  to  P  is 

^1  ^1 

Pp  (dx.) 

^1  ^  f  I 

a_  (x, )  =■  - — 73 — 7  -  a -  K  ^  (x,  ,x«)  P  i  (dx_  x, ) 

1  P^  (dx^)  J  (Cj^C2)  5  (nj^ri2)  1  2  2'  1 

1  X«. 

^  (1.3. 1.7) 


P  -  a.e.),  the  Radon-Nicodym  derivative  of  P_  with  respect  to  P  is 

n,  “i  <5 


(x„) 


Pp  (dx-) 
^2  ^ 


^2  =  ^2'2^ 


a.-  -  V  ^  \  >^9)  P..  I  -I  (dx-  ]  X—) 

(Cj^52)  1  2  '^^1^2  ^  ^ 


(1.1.1. 8) 


(P  -  a.e),  and'  the  Radon-Nicodym  derivative  of  P_  i-  (.|x^ 

^2  ^2l^l  ^ 


(.  x^)  with 


respect  to  P  1  (.  x.)  is 

n2hj^  1 


‘(52Ui):(n2lni)^''l’''2^  ■  a.  (x, ) 


(5iC2):(TliTi2)^''l’V 


P  I  (dx_|x.) 
^2 '  ^1  ^  ^ 


(1. 3. 1.9) 


(P  a.e) 


Proof.  From  (I. 3. 1.6)  it  follows 


Pj  j  (T) 


l/r  r  \  \  'P.,  „  (dX-dX,) 

(5-5,)  :(n,n,)  1’  2'  n-n,  1  2 


(1.3.1.10) 


so  that  if  T  “  X2,  e  Sj^,  from  Fubrini's  theorem  it  follows  that 


P_  (T  )  -  P  (T,  X  x_) 
^1 


Ti  X2 


a  (x.)  P  (dx.) 


(1.3.1.11) 


where  a_  (x.)  is  given  by  (I. 3. 1.7) 
^1-^1 


Similarly,  if  in  (1.3.1.10)  we  take  T  =  x  T2,  T2  e  S2,  it  follows  in 


the  same  way 


Tj 


^/•r  f  \.fr.  r>  \(x.  ,x-)  P  I  (dx,  x„)]  P  (dx-) 


a-  (x  )  P  (dx  ) 
^2'^2  2  ri2  2 


(1.3.1.12) 


where  a.  (x„)  is  given  by  (1.3. 1.8) 
s  2  •  ^  2  ^ 


The  relation  (1.3.1.10)  can  be  written  as 


®/c  r  ^.Cr,  x(x,  ,x  )  P  I  (dx-|x  )P  (dx  ) 


(1.3.1.13) 


and  using  (I. 3. 1.7)  we  can  write  it  as 


"'52l«i'‘‘’‘2'’‘l’  -  ^£iE2):<-  n2)”l''‘2'  ‘■„2|„j'“2l V ^ 


(x,  ,x,)  P_  (dx2|Xj^)]  (dx,) 

(1.3.1.14) 


for  any  set  T  e  x  $2,  so  that 


(1.3.1.15) 


(P  a.e.).  Consequently,  denoting 
*^1  a._  -  ,  ,  V  (x,  ,x-) 

(Cj^52)  :  (111^2)  1  2 

^(52^1)  = 


Sr  (Xi) 


(1.3.1.16) 


we  obtain 


(x, ,x.).P^  I  (dx-jx.) 


(1.3.1.17) 


(P  a.e.).  From  (1.3.1.17)  it  follows 


±. 

j^^(T2|xi)  =  j  (1.3.1.18) 


i.e 


.  P.  I_  (.|x, )  is  absolutely  continuous  with  respect  to  P  i  (.[x.) 


(P  a.e.),  with  (1.3.1.16)  as  the  Radon-Nicodym  derivative. 


Let  us  now  denote 


i..  (x-)  =  log  a_ 

1  °  ^l‘*^l 


log  *(52^2^  •^02ln2>(X2.X2) 


(1.3.1.19) 


(1.3.1.20) 

(1.3.1.21) 


Lemma  1.6.  In  each  of 


the  relations 


n{2ki):<n2ln2)<*r*2>  \  ^•*> 

(1.3.1.22) 

^?i:ni^V  ^(52!Ci):(n2h^)^''l’''2^ 


(1.3.1.23) 


if  two  of  the  quantities  are  finite,  then  the  third  one  is  also  finite 
and  the  relation  is  verified. 

In  the  case  that  the  random  vectors  are  independent,  and  the 

random  vectors  C2»’^2  also  Independent,  the  relations  (1.3.1.22), 

(1.3.1.23)  take  the  simplified  form 


a_  (x, )  .  a_  (x,) 

^1*^1  ^  s2*^2  ^ 


i  (x.,)  +  ip  (x^) 


(1.3.1.22’) 

(1.3.1.23’) 


Proof.  The  equality  (1.3.1.22)  follows  from  (I. 3. 1.9)  and  (1.3.1.23) 
follows  from  (1.3.1.22),  taking  into  consideration  (1.3.1.20),  (1.3.1.21). 
The  relations  (1.3.1.22’),  (1.3.1.23')  follow  from  the  relations 
(1.3.1.22),  (1.3.1.23),  taking  into  consideration  that  in  the  conditions 
of  independence  indicated  in  the  Lemma, 


(1.3.1.24) 


(1.3.1.25) 


h[(C2lx,):(n2|x,)]  =  J  ^52|5p:(q2|n^)("l’"2>  ^5  J  ^ . ) :  (n,  [  n, )  ^"l’ ^2^^,  h 


h[(52Ui):(n2|ni)]  =  J  h[  (C2I  x^^)  :  (n2l  x^)  ]  (dx^) 

*1  " 


’2'  1  ■  2'  1 


(1.3.1.26) 

(1.3.1.27) 


This  is  the  relative  conditional  entropy  of  the  ordered  pair  of  random 
vectors  52’^2  respect  to  the  ordered  pair  of  random  vectors  Tij^*n2* 

Lemma  1.7, 

a)  If  two  of  the  three  quantities  in  the  relation 

til (^2^52)  =  (nj^n2)  1  =  ti(52’i2^  +  h[(52l^l^ (1.3.1.28) 
are  finite,  then  the  third  one  is  also  finite  and  they  verify 
this  relation. 

b)  In  the  particular  case  where  independent  and  Ti2»n2  ^1^® 

independent,  the  relation  (1.3.1.28)  takes  the  simplified  form 


h[(52^52)  =  (n2^n2)]  =  h(C2^:n2^)  +  h(C2:il2) 


(1.3.1.28') 


Proof.  Because  of  (1.3.1.22),  we  obtain 


‘(qC2):(n2n2)'  1’  2 


(x. ,x-)  log  a 


(q52):(V2^  2 


(x. ,x-)  =  [a_  (x,)  log  a_  (x  ) ] 


r  .  '•''1  ^ 

q.ni  1 


(C2Ui):(n2ln^)'^l’^2 


(x, ,x.)  log  a 


(x, ,X-) ] 


(C2U2):(n2ln2)'  1’  2^ 


•a. 


(x, ) 


(1.3.1.29) 


and  we  take  the  integrals  of  both  members  on  P 


-  measure  over  X,  x  X„. 

n  2  12 

From  (1.3.1.26),  (1.3.1.27)  it  is  seen  that  the  integral  of  the  left 
hand  side  is  h[(C2^C2)  =  (P]^h2)  ]  • 

Let  us  consider  now  the  integral  of  the  first  term  of  tJie  right  hand  side. 
We  obtain  from  Fubini's  theorem  and  from  (1. 3. 1.9) 


X^xX2 


,  (dxJx,)  •  ■„  In  '“2'~l'-n,'““l 
X^xX2  ^2!'’i  ^ 


.  P_  I  _  (dx.,|x,).P_  (dx,)  = 
1 


"  h(e^:np  = 


(1.3.1.29') 


Let  us  consider  now  the  integral  of  the  second  term  of  the  right  hand 
side.  We  obtain  from  Fubini's  theorem  and  (I. 3. 1.7),  taking  in 
consideration  (1.3.1.26),  (1.3.1.27),  that 

[a.j.  I-  X  f  I  \(x.,X.x)  log  ^  (r  Ir  In  ^  r\ 

'•  (52lCj^)  :(n2ln3^)  1  2'  ®  (^2^^) :  (n2lnj^)  1  2  l  n^n2 


V^2 


[a 


(C2Ui):(n2ln^)^''l’''2^  a(52l5^):(n2ln^)^^r^2^J  ^^('^’^1^ 


(x, ,x„) ] 


Pp  (dx, ) 
^1 


X^xX2 


’^n  Ir,  (dXnlx.)  P  (dx  )- 


P  (dXj)  J 

X, 


h[(?2|xi)  =  (h2|xj^)]  (dXj^)  =  h[(C2U2^) :  (n2l^l^  ]  (1.3.2.29") 

^1 

which  proves  our  Lemma  1.7  part  (a).  Part  (b)  follows  from  part  (a) 
taking  into  consideration  (1.3.1.25). 
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1.3.2.  Let:  be  a  measurable  space  and  a  probability 

space,  where  is  a  set  of  elements  oj^,  a  a-algebra  of  subsets  of 
a  probability  measure  on  (1  ^  i  ^  n)  . 

We  consider  the  random  vectors  defined  on  this  probability  space 

with  values  in  the  measureable  space  (X^,S^)  where  is  a  set  of 
elements  x^,  a  a-algebra  of  subsets  of  X^  (1  £  i  ^  n) .  Let 

A  C  {1,2,. ..,n},  (I. 3. 2.1) 

and 


(Q^,Ea)  =  X 
icA 


be  a  measurable  space,  where  is  the  set  of  elements  i  e  A) 

and  Z .  =  X  Z  . . 

A  j  .  i 
ieA 

(k) 

In  the  particualr  case  where  A  =  {l,2,...k},  we  denote  =  u  , 

(k)  (k) 

,  Z  *  Z  (1  ^  k  ^  n)  .  Let  P  be  a  probability  measure  on 

Z.  with  marginals  P  (ieA).  So  (fi.,Z.,P.)  is  a  probability  space. 

A  X  AAA 

We  consider  the  random  vectors  C.Coi).).  n.(u.)  given  on  the  probability 

A  A  A  A 

space  (n^,  Z^,  P^)  by 


^A^“a^  =  (5^(0)^);  ieA};  =  {h^(m^) ;  ieA} 


(1.3. 2. 2) 


with  values  in  the  measurable  space  (X  ,S  )  =  X  (X  ,S  )  where 

A  A  •  *  X  X 

ieA 

x^  =  {x^,  ieA},  with  x^  e  X^  (i  e  A),  are  elements  of  X^  and  =  X 


IeA 


If  A  =  {l,2,...,k},  then  we  denote  x^  =  x^^\  X^  =  X^^\ 


Ha  -  n 
Let 


(k) 


P^  (T^)  -  P^{u)^;  5j^((0^)  e  T^},  e  (1  1  i  1  n) 
P^  (Tj,)  =  ^  ^1^’  ^i  ^  ^1  (1  £  i  £ 


(1.3. 2. 3) 
(1.3. 2. 3') 


be  the  probability  measures  of  the  random  vectors  (1  £  i  £  n) 


V*"  -V 
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and  let 


”5  <V  -  Vv  'a<“a)  "  V-  ■'a'^a 

A 


^  'V  ■  ^‘"A'  "a'V  '  ■'a  •  ■'a'^ 

A 


(1. 3. 2. 4) 
(1. 2. 3. 4*) 


Lemma  1.8. 


If  P  is  absolutely  continuous  with  respect  to  P  and  the  Radon- 


Nicodym  derivative  of  P  ,  .  with  respect  to  P  ,  .  is 

P<n)  ^  A  .  n(n) 

a  _ 


(1. 2. 3. 5) 


,(n) 


then  we  will  prove  the  following : 


(a)  for  any  A  G  {l,2,...,n},  the  probability  measure  P.  is  absolutely 


continuous  with  respect  to  P  and  the  Radon-Nicodym  derivative  of  P_ 


with  respect  to  P  is  given  by  the  relation 


^A 


(1. 3. 2. 6) 


where  Pi  is  a  conditional  marginal  measure  of  P  a  . ,  and 
A'  A  n 


A  *  {1,2, .. . ,n}  -  A. 

(b)  for  any  A^C  {1,2, . . .  ,n}  ,  A2C{1,2 , . . .  ,n}  ,  A^O  A^  =  0,  the  probability 


measure 


\|'a,  '-'V 


(1. 3. 2. 7) 


(conditional  marginal  measure  of  P  is  absolutely  continuous  with 


respect  to  the  probability  measure 


(•  U.  ) 


(I. 2. 3. 7') 


(conditional  marginal  measure  of  P  and  the  corresponding  Radon- 
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Nicodym  derivative  is 


Proof , 


'^lU  ^z'Xu  ^2  ""2 


2  “1  "2  "1 


C*  U.  (dx.  X.  ) 


h  h  ^ 


\  l^A 


(P  a.e.) 


(a)  Let  us  consider  the  result  (a)  in  Lemma  1.9  for 


(I. 3. 2. 8) 


"  ^A‘  ^2  “  ^A’  '’i  “  ’’a*  ^2  “  '’a 


(1. 3. 2. 9) 


where  A  C  {l,2,...,n},  A  =  {l,2,...,n}  -  A.  In  this  case 


h:q  "  ^^(n)’  Vn' 


(1.3.2.10) 


and  because  P  is  absolutely  continuous  with  respect  to  P  ,  it 

follows  that  P-  is  absolutely  continuous  with  respect  to  P  .  The 
^A  ^A 

relation  (I. 3. 2. 6)  follows  from  (I. 3. 1.7). 

(b)  Let  A^<^2  mutually  exclusive  subsets  of  (1,2 . n}  so  that 

A^  (J  A2  is  also  a  subset  of  the  same,  so  from  part  (a)  above  it  follows 


that  P, 


is  absolutely  continuous  with  respect  to  P 


U  ^2 


^1  “  ^Aj^’  ^2  "  ’  ^1  “  ^Aj^’  ^2  “  ^A2 


so  that 


P  ■  P  ,  P  =  P 

\^A2  %LrA2  \'’a2  A2 


(1.3.2.11) 


(1.3.2.12) 


■  From  Lemma  1.9(b)  it  follows  that 

is  absolutely  continuous  with  respect  to 


(1.3.2.13) 


"^A  '’^A 
^2  *1 


(.lx.  ). 


(1.3.2.13*) 


The  relation  (1. 3. 2. 8)  follows  from  (1. 3. 1.9) 


The  Lemma  is  proved. 


In  what  follows  we  will  be  particularly  Interested  In  the  case  where 


(k-l) 


A-  ■  {k} ,  *  {1,2, . . . ,k-l}  (1  £  k  £  n)  so  that  5  ? 

SI  h  ^2  ’  k' 


From  Lemma  1.8^  it  follows  that 

P 


'J' 


(k-l) 


(1.3.2.14) 


Is  absolutely  continuous  with  respect  to 


\|n 


(k-l) 


(1.3.2.14’) 


(P  -  a.e.)  and  the  corresponding  Radon-Nicodym  derivative  is 

a.  |r(lt-l)^  (n  I  (k-l).(x^^^  ^k^^ 

(5^15  ):(n,^ln  )  =-  7  (Tli') 


^  I  (k-D^^^^kl^  ^ 

(1.3.2.15) 


Let 


i  /■  \  !  \(x^”^)  *  log  a  .  .  (  .(x^’^^) 


(1.3.2.16) 


(1.3.2.17) 


^k'^k  ^k'^k  ^ 


(1.3.2.18) 


.(k) 


(k-l)/*  >  - 


(1.3.2.19) 


Lemma  1.9. 


(a)  In  each  of  the  relations 


(n), 


%(n)  (n)^"^^>-  ^ 

5  :n  k= 


(k-l),(x*’‘^> 


(1.3.2.20) 


(x'W) 


(1.2.3.21) 


if  n  of  the  quantities  are  finite,  then  the  n+1  -  th  is  also  finite  and 


the  relation  is  satisfied. 
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(b)  In  the  case  the  random  vectors  (1  £  i  £  n)  form  a  simple  Markov 
chain,  and  the  random  vectors  (1  ^  1  ^  n)  form  a  simple  Markov 
chain,  the  relations  (1.3.2.20),  (1.3.2.21)  take  the  simplified  form 


‘  (n)  ■  I  ‘ 

5  :n 


n 

IT 


k-i  «kl«k-i)  =  '"klVi>'’‘'‘-i’*'‘' 


(1.3.2.20’) 

(1.3.2.21’) 


(c)  In  the  case  the  random  vectors  C^(l  ^  1  ^  n)  are  Independent  In  their 
totality  and  the  random  vectors  (1  ^  1  £  n)  are  Independent  In  their 
totality,  the  relations  (1.3.2.20),  (1.3.2.21)  take  the  simplified  form 


a  /  V  /  \  (x  )  =  IT  a-  (x, ) 

k-l  'k'-'k 


Proof . 


5(n):n(n)^^^“^^  " 


(1.3.2.20”) 

(1.3.2.21”) 


(a)  From  the  Identity 


.l,(dx^|x''‘-») 


5  |6 


(k-l) 


.  ;;  _k _ 

P  („,(«*<■■>)  k-l  P  .  (dx^lxO-^) 

n  n  I  n 

k 


(1.3.2.22) 


taking  Into  consideration  (1. 3. 2. 5),  (1.3.2.15)  follows  (1.3.2.20) 
From  (1.3.2.20),  taking  Into  consideration  (1.3.2.16),  (1.3.2.19) 
follows  (1.3.2.21). 

(b)  In  the  conditions  of  markovlan  dependence  Indicated  In  the  Lemma, 
(k-l). 


(CkIC 


(h) 


(1.3.2.23) 


(k) 

^  ^(1.3.2.24) 

Consequently  the  relations  (1.3.2.20),  (1.3.2.21)  take  the  form 
(1.3.2.20’),  (1.3.2.21’). 
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I 


m 

m 

kI 


(c)  In  the  conditions  of  Independence  Indicated  in  the  Lenuna 


(1.3.2.23') 


(1.3.2.24') 


Consequently  the  relations  (1.3.2.20),  (1.3.2.21)  take  the  form 
(1.3.2.20"),  (1.3.2.21"). 

The  Lemma  Is  proved. 

Let 

h[(5.  |x^‘'~^^):(n. 


/  (k). 


(x(">) 


Hkln 

h[(Cj^U^‘'’^^):(nj^|ri^‘'‘^^)] 


(1.3.2.25) 


.(k-1). 


.(k-1). 


.(k-1) 


Lemma  I. 10. 


(1.3.2.25') 


(a)  If  n  of  the  n  +  1  quantities  In  the  relation 


h(C^"^n^"^)  -  I  h[(5^U^‘''-'^):(nt|Ti^‘'’'^^)] 

k»l 


,(k-l).  .  |_(k-l). 


(1.3.2.26) 


are  finite,  then  the  n  +  1  -  th  one  is  finite  also  and  the  relation 
is  satisfied. 

(b)  In  the  case  the  random  vectors  (1  ^  i  ^  n)  form  a  simple  Markov 

chain,  and  the  random  vectors  (1  ^  i  ^  n)  form  a  simple  Markov  chain, 
the  relation  (1.3.2.26)  takes  the  simplified  form 

h(^(n).^(n)j  -  I  h[(5,^[c^^_j^):(n^^|n^^_3^)] 
k*l 


(1.3.2.26') 


(c)  In  the  case  the  random  vectors  (1  £  i  £  n)  are  independent  In  their 
totality,  and  the  random  vectors  (1  £  i  £  n)  are  independent  in 
their  totality,  the  relation  (1.3.2.26)  takes  the  simplified  form 


-  I  h(C.  rn^) 


(1.3.2.26”) 


Here  we  denoted 


h[(C3^|C^°^):(n^|ri^°h]  =  h(5^:n^) 

h[(Cj^ICQ)  :(nj^|nQ)]  =  h(5j^:Tij^) 


(1.3.2.27) 


(1.3.2.27') 


Proof.  From  Lemma  1.12(b)  with  =  {l,2,...,k},  A2  =  =  {k,k+l, . . .  ,n}  , 

it  follows  that  if  P  is  absolutely  continuous  with  respect  to 


P  ,  ,,  then, 

n'”> 


5b  15 

k 


(k-1) 


is  absolutely  continuous  with  respect  to 


hp  h 
\ 


(k-l) 


(.|x<’^-^>) 


(1.3.2.28) 


(1.3.2.28') 


for  (k  »  2,3,...,n).  Let  now 

C  *  5  ;  =  5^  ;  nJ  *  n.  ;  nl  =  n-  5  8,=  {k,...,n}  (1.3.2.29) 

1  k  2  1  k  2  B^^^  k 


44  ■  4.  ’  ''i"2  ■  V 

k  k 

and  consequently  from  (1.3.1.28)  it  follows 

h[(C„  lc^^'^^):(n„  =  h[(5.  + 


(1.3.2.30) 


+  h[(C.  h^^^)]  (1  <  k  <  n-1) 

®k+l  ®k+l  “  “ 


(1.3.2.31) 


so  that 

T  HK5b  ^''‘-“MnB  In<'‘-»)1  -  T  •.I(5,|5''‘-»):(.,|n'''-»>l. 

k«l  k  k  k»l 

+  "i  h[(C_  l5^^^:(Tl„  (1.3.2.32) 

k-l  ®k+l  \+l 
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where 

(1.3.2.33) 

h 

From  (1.3.2.32)  follows  (1.3.2.26). 

(b)  In  the  conditions  of  markovlan  dependence  Indicated  in  the  Lemma,  the 
relations  (1.3.2.23),  (I.3.2.24)are  satisfied  and  this  implies 

h[(5j5  :(TiJn^_p]  (1-3.2.34) 

so  that  (1.3.2.26)  takes  the  form  (1.3.2.26'). 

(c)  In  the  condition, Independence  indicated  in  the  Lemma  of  the  relations 
(1.3.2.23'),  (1.3.2.24')  are  satisfied  and  this  implies 

-  H(5^:ti^)  (1.3.2.35) 

so  that  (1.3.2.26)  takes  the  form  (1.3.2.26''). 
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1.4.  An  extension 
1.4.1.  Let  5,ti,C,  be  three  random  vectors,  with  the  same  values  x^, 
and  let 


=P(5  “x^);  P^Cxj^)  =P(ti  =  x^);  P^(x^)  *  x^)  (1  1  i  1  n) 

(1. 4. 1.1) 

In  analogy  to  (1. 1.1. 2),  the  relative  entropy  of  ?  with  respect  to  n 

from  the  point  of  view  of  z  (or  of  with  respect  to  P^  from  the 

point  of  view  of  P^)is  given  by  the  expression 

n  P  (x  ) 

h(5:Ti;0  =  h(P^:P^;P^)  =  P^(x^)  log  (1. 4. 1.2) 

where  for  a  ^  0  we  consider  0  log  —  =  0. 

1.4.2  Now  let  (n,I,P)  be  a  probability  space,  where  0  is  a  set  of 

elements  oi,  I  a  a-algebra  of  subsets  of  fi,  P  a  probability  measure  on  Z 

We  consider  three  random  vectors  C,n,C,  defined  on  this  probability 
space,  with  values  in  the  measure  space  (X,S,p),  where  X  is  a  set  of 

elements  x,  S  a  o-algebra  of  subsets  of  X,  u  -  a  measure  on  S. 

Let 


P,(T)  -P{m;  C(u)  e  T};  P  (T)  -P{co;  n(a))  e  T}  ;  P^(T)  =P{a);  c(u))  e  1}  ,  T  e  S 
?  4 

(I. 4. 2.1) 

be  their  probability  measures. 

With  a_  (x)  defined  in  (I. 1.8. 2)  and  i_  (x)  defined  in  (I. 1.8. 4),  in 
5:n  5 

analogy  to  (I. 1.8. 3),  we  define  the  quantity 


h(c:n;c) 


i  (x).P  (dx) 

J  5-h  4 


(1.4. 2. 2) 


as  the  relative  entropy  of  5  with  respect  to  p  from  the  point  of  view 
of 

In  the  particular  case  that  the  probability  measures  P^ ,  P^ ,  P^  are 

defined  in  terms  of  densities  ir_  (x) ,  x  (x) ,  ir,(x),  with  respect  to  a 

C  n  ? 

measure  p,  the  Integral  formula  (1.4. 2. 2)  reduces  to 

^g(x) 


h(C:n;C) 


w 

< 


dx 


(1.4. 2. 3) 
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where  the  integration  is  on  u  -  measure.  Obviously,  if  C  =  C ,  the 
expression  (I. 4. 1.2)  reduces  (I. 1.1. 2),  the  expression  (I. 4. 2. 2) 
reduces  to  (I. 1.8. 3)  and  (I. 4. 2. 3)  reduces  to  (I. 1.8. 5),  i.e. 


h(5:Ti;5)  =  h(5:Ti) 


(I. 4. 2. 4) 


In  analogy  with  (1.3. 1.26)  we  define  the  quantity 


h[(?2|x3^):(n2lxi);(?2i^l)]  ^(52|€j^)  :  (n2hj^)  ^''l’''2^ 

^2  (I. 4. 2. 5) 

and  in  analogy  with  (1.3.1.27)  we  define  the  quantity 

h[(52l^l^  •  ’  ^^2^^1^^  “I  :  (n2|x^) :  (C2I  (dxj^) 

X  ^ 

^  (I. 4. 2. 6) 


This  quantity  is  the  relative  conditional  entropy  of  the  ordered  pair  of 
random  vectors  respect  to  the  ordered  pair  of  random  vectors 

nj^,n2  from  the  point  of  view  of  4-|^»42‘ 

Lemma  I . 11 . 

(a)  If  two  of  the  three  quantities  in  the  relation 


h[(5j^52):(nj^n2);(CiC2)]  =  h(5^:n^;Cj^)  +  h[ (52!^) :  (n2l n^) ;  (C2U1)  J 

(I. 4. 2. 7) 

are  finite,  then  the  third  one  is  also  finite  and  they  verify  this 
relation. 

(b)  In  the  particular  case  where  Cj^,52  are  independent,  and 

independent,  the  relation  (I. 4. 2. 7)  takes  the  simplified  form 

h[(5^C2^  =  ^'’l'’2^’^^lS^^  “  h(5^:n^;C^)  +  (1. 4. 2. 7’) 

Proof.  Integrating  both  sides  of  relation  (1.3.1.23)  on  P  (dx..  ,dx„) 

hh 
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it  follows  from  Fublnl's  theorem  that 


h[(5  ^^^2^  •  ;  (C2^C2^^ 


‘(ei52)  =  (nir,2)‘*l'’‘2> 


(I. 4. 2. 8) 


1  2 

h  h 

Taking  into  consideration  (I. 4. 2. 2),  (I. 4. 2. 5),  (I. 4. 2. 6)  we  obtain 
(I. 4. 2. 7).  Taking  into  consideration  (1.3.1.25),  from  (I. 4. 2. 7) 
follows  (I. 4. 2. 7'). 

1.4.3.  In  what  follows,  we  will  use  the  concepts  and  notations  in  1.3.2. 


Similar  to  the  random  vectors  we  consider  the  random  vectors 


defined  on  the  same  space  and  with  values  in  the  same 


measurable  space  (Xj,,Sj^)  (1  £  i  ^  n)  .  Similar  to  the  random  vectors 


C^(a)^) ,  h^(<D^) ,  we  consider  the  random  vector 


Ca^“a)  *  i  e  A} 


(I. 4. 3.1) 


defined  on  the  same  space  with  the  values  in  the  same 


measurable  space  (^a’^A^* 


If  A  -  {l,2,...,k},  let  i^A  = 


Similar  to  P,  (T.),  P  (T.)  in  (1.3. 2. 3),  (1.3. 2. 3'),  we  define 
5i  i  i 


P^  (T^)  =  V“i’  *^i^“i^  ^  V’  ^i  ^  ^i’  (1.4. 3. 2) 


as  the  probability  measure  of  the  random  vector  (1  ^  i  n)  and 


similar  to  P^  (Tj,  P  (T.)  in  (1.3. 2. 4),  (1.3. 2. 4')  we  define 
Ca  A  Pa  ^ 


P^  (T^)  -  V“a’  ^A^“a^  V  ’  ’’^A  ^  ^A 

A 


(1.4. 3. 3) 


f  r* 


% 


I 


ITC 


3 


■A 

n 
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Similar  to  (1.3.2.14),  (1.3.2.14*),  the  measure 

.(k-1) 


(I. 4. 3. 4) 


is  a  conditional  marginal  measure  from  P 


(n)- 


Let 


h  [  (5  I X  :  (nj^ I X ;  (?j^|  X = 


h[  (5j^U  )  :  (^1,1  n  )  ;  (?i^l  )  ]  = 


=  1  h[(Cj^|x^^‘^^):(nj^|x^^'^^;(Cj^|x^’""^h]  P 


(k-1) 


,(k-l) 


(I. 4. 3. 5) 


Lemma  1.12. 


(a)  If  n  of  the  n  +  1  quantities  in  the  relation 


h(^(n).r,(n),^(n)^  .  I  h[(5.  :(n.  ;  (C.  (1. 4. 3.6) 


k»l 


are  finite,  then  the  n  +  1  -th  one  is  finite  also  and  the  relation  is 
satisfied. 

(b)  In  the  case  the  random  vectors  5^  (l^i^n)  form  a  simple  Markov 


chain,  and  the  random  vectors  (1  ^  i  ^  n)  form  a  simple  Markov  chain. 


the  relation  (I. 4. 3. 6)  takes  the  simplified  form 

n 

I 

k=l 


=■  I  h[(5j^Uj^.^):(Ti^^lVl>'(*^kl  (1. 4. 3.6’) 


(c)  In  the  case  the  random  vectors  5j^(l  1.  ^  n)  are  independent  in  their 


totality,  and  the  random  vectors  (1  ^  i  £  n)  are  also  independent  in  their 
totality,  the  relation  (I. 4. 3. 6)  takes  the  simplified  form 

h(5(n).^(n),^(n))  ^  ^  h(5.  :r),;C.) 


k-1 


(1. 4. 3. 6") 


Here  we  denoted  )  :  (rij^jn^^^)  ;  (C^l  i;  ^°^)  ]  »h[(Cj^|CQ)  :  (n  j^I  Hq)  ;  (C  Cq)]^ 


2^ » C  2^)  • 


Proof . 


(a)  Integrating  both  sides  of  (1.3.2.21)  on  P  ,  it  follows  from 

jCn) 

Fubini's  theorem  that  if  =  {k+l,...,n} 


.(n) 


^  (n)  (n)  (n)  = 

s  *n  ^ 


n 

I 

k=l 


n 

I 

k=l 


^  I  (k-1)  I  (k-1)  P  (k)^*^^^ 


P  (dx  )  x^’'^ 

X,  B, 

Bk 


^  I  (k-1)  I  (k-1)  (k)  ^  ” 


=  I  h[(5,  |c^‘'"^^):(nJn^‘'‘-'^);(?.  U^*""-"^)] 


k»l 


Ir(k-l), 

k'" 


(1. 4. 3. 7) 


(b)  In  the  conditions  of  markovian  dependence  indicated  in  the  Lemma,  the 
relations  (1.3.2.23),  (1.3.2.24)  are  satisfied  and  this  implies 

so  that  (1.4.3. 6)  Cakes  Che  form  (I. 4. 3. 6').  (I. 4. 3. 8) 

(c)  In  the  conditions  of  independence  indicated  in  the  Lemma,  the  relations 
(1.3.2.23'),  (1.3.2.24')  are  satisfied  and  this  implies 

=  h(Cj^:nj^;  (1. 4. 3.9  ) 

so  that  (I. 4. 3. 6)  takes  the  form  (1.4.3. 6''). 


1.5.  Comments 


6  5 


Lemma  I.l  -  is  a  known  result  (See  [9]  page  17,  Ex.  3.2)  proved  here 

with  typical  elementary  means.  The  Author  has  not  seen  this 
proof  elsewhere.  In  [9]  it  is  proved  in  a  complicated  way 
and  in  [1]  in  a  most  complicated  way. 

Lemma  1.2  -  belongs  to  the  Author;  the  proof  uses  the  method  used  in 
[3],  page  203  in  a  particular  case. 

Lemma  1.3  -  In  the  case  r  =  2,  part  a)  is  proved  in  [1].  For  r  >  2 
part  a)  and  part  b)  together  with  the  proofs  belong  to  the 
Author . 

Theorem  1.3  -  is  presented  in  [11]  without  proof,  references  being  made 
to  a  particular  case  in  [1].  Our  proof  follows  A.  Feinstein's 
remarks  to  Ch.  2  in  [11],  with  many  additions  and  clarifications. 

Theorem  1.4  -  is  presented  in  [11]  without  proof.  The  proof  given  here 
belongs  to  the  author. 

Theorem  1.4'  -  belongs  to  the  Author. 

Lemma  1.5  -  belongs  to  the  Author. 

Lemma  1.6  -  belongs  to  the  Author. 

Lemma  1.7  -  belongs  to  the  Author. 

Theorem  1.7  -  is  presented  in  [11]  without  proof;  this  proof  belongs  to 

the  Author. 

Theorem  1.6  -  belongs  to  the  Author. 

Theorem  1.7  -  can  be  found  in  [11],  but  the  proof  has  been  partially 
changed  for  the  sake  of  clarity. 

Theorem  1.8  -  Part  a)  can  be  found  in  [8].  While  following  the  proof 
of  a)  in  [8],  the  Author  changed  the  order  of  presentation 
for  the  sake  of  clarity.  Part  b)  represents  an  amelioration 
of  a  statement  in  [8],  and  it  belongs  to  the  Author,  even 
that  methods  from  [8]  are  used. 
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Theorem  1.9  -  while  following  the  proof  in  [8],  the  Author  changed  the 
order  of  presentation  for  the  sake  of  clarity. 

Theorem  I. 10  -  follows  [8],  with  some  clarifications  in  the  proof. 
Lemma  1.5  -  belongs  to  the  Author. 

Lemma  1.6  -  belongs  to  the  Author. 

Lemma  1.7  -  belongs  to  the  Author.  See  also  [12]. 

Lemma  1.8  -  belongs  to  the  Author. 

Lemma  1.9  -  belongs  to  the  Author. 

Lemma  I. 10  -  belongs  to  the  Author. 

Lemma  I. 11  -  belongs  to  the  Author.  See  also  [15]. 

Lemma  1.12  -  belongs  to  the  Author. 


m 
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